I have a vector dataset dt and only a few of 500 observations is given below.
[1] 0 4 8 13 15 1 0 5 4 6 0 0 0 1 2 3 0 0 0 0 0 3 0 10 15 0 23 0 6 0 10 2 15 0 3 0 2 0
[39] 3 0 42 4 6 12 3 2 0 0 0 2 3 9 6 0 3 2 1 6 0 0 0 0 0 1 0 0 2 0 0 0 0 7 0 0 0 1
I want to check if the distribution of data follows a negative binomial distribution or Poisson distribution better using the KS test and regression. There are only a few R packages available for the KS test of which not all give a good summary and some are no longer valid.
What I have tried?
Package iZID
install.packages("iZID")
library("iZID")
dis.kstest(dt,nsim=100,bootstrap=TRUE,distri='nb')$pvalue
dis.kstest(dt,nsim=100,bootstrap=TRUE,distri='Poisson')$pvalue
Returns
> dis.kstest(dt,nsim=100,bootstrap=TRUE,distri='nb')$pvalue
[1] 0
> dis.kstest(dt,nsim=100,bootstrap=TRUE,distri='Poisson')$pvalue
[1] 0
Question
First, With the above results it’s hard to recommend which distribution is a better fit. Second, I have also used ks.test() in dgof package but it needs initialization of lambda value for Poisson which is unknown and also Negative binomial should have a probability value initialized which is unknown. Third, I receive an error suggesting that goodfit() in vcd package is not available hence couldn't use this tool.
Can I use a ks-test package other than
dgof,iZID, andvcd?Can I use regression to compare the AIC of two distributions using
glmin R? In that case, how to build the model? What is X and Y?
Data in dput format
dt <-
c(0L, 4L, 8L, 13L, 15L, 1L, 0L, 5L, 4L, 6L, 0L, 0L, 0L, 1L, 2L,
3L, 0L, 0L, 0L, 0L, 0L, 3L, 0L, 10L, 15L, 0L, 23L, 0L, 6L, 0L,
10L, 2L, 15L, 0L, 3L, 0L, 2L, 0L, 3L, 0L, 42L, 4L, 6L, 12L, 3L,
2L, 0L, 0L, 0L, 2L, 3L, 9L, 6L, 0L, 3L, 2L, 1L, 6L, 0L, 0L, 0L,
0L, 0L, 1L, 0L, 0L, 2L, 0L, 0L, 0L, 0L, 7L, 0L, 0L, 0L, 1L)
First, to answer your question about
KStest packages, yes, you can try thekspackage in R.It provides the
ks.test()function that you could use to perform aKS teston your dataset.For example, to perform a KS test on your dataset using a Poisson distribution, you can use the following code (of course I am assuming that you already have installed the package, before running the below code):
Here, "ppois" is the Poisson distribution, "lambda" is an estimate for the
rate parameter of the Poisson distributionand we have set it to the mean of your dataThe above code returns:
Which means:
Similarly, you could use the
qnbinomdistribution for the negative binomial distribution.Now, to answer your second question about comparing the AIC of two distributions using regression, what you could use is
glm()native function in R.All you need to do is, first, create an output variable
Ythat represents the count data, and then an explanatory (input) variableXthat refers to the index of each observation in your dataset.Now you can fit two models; one, with the Poisson Distributions and, second, with the Negative Binomial Distributions, respectively, and compare their AIC values to determine which distribution provides a better fit.
Which returns:
And as we know, lower the AIC value, the better the model fits the data. Therefore, we can conclude that the
Negative Binomial Distributionprovides a better fit to the data compared to thePoisson Distribution.Let me know if this helps...