Objective
Find the best fitted distribution for data and then generate random numbers from this distribution.
Example
Using the gamlss package in R, I found that the best fit distribution is "Skew exponential power (Azzalini type 1)":
library(gamlss)
library(gamlss.dist)
library(gamlss.add)
m1 <- fitDist(mtcars$wt, k = 2, type = "realAll", trace = FALSE, try.gamlss = TRUE)
summary(m1)
# *******************************************************************
# Family: c("SEP1", "Skew exponential power (Azzalini type 1)")
#
# Call: gamlssML(formula = y, family = DIST[i])
#
# Fitting method: "nlminb"
#
#
# Coefficient(s):
# Estimate Std. Error t value Pr(>|t|)
# eta.mu 3.440000000 0.000149516 23007.56651 < 2e-16
# eta.sigma -1.856665040 0.861826733 -2.15434 0.031214
# eta.nu 0.150728244 NA NA NA
# eta.tau -3.524272086 NA NA NA
#
# eta.mu ***
# eta.sigma *
# eta.nu
# eta.tau
# ---
# Signif. codes:
# 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Degrees of Freedom for the fit: 4 Residual Deg. of Freedom 28
# Global Deviance: -27.4747
# AIC: -19.4747
# SBC: -13.6117
# Warning message:
# In sqrt(diag(object$vcov)) : NaNs produced
Error
But the values of sigma and tau are negative. When I provide these values to rSEP1() to generate a random number, it throws the following error:
rSEP1(1, mu = 3.44, sigma = -1.8, nu = 0.15, tau = -3.5)
# Error in rSEP1(1, mu = 3.44, sigma = -1.8, nu = 0.15, tau = -3.5) :
# sigma must be positive
Are these values transformed? How can I provide correct inputs to rSEP1()?
If you look at the link functions for the parameters, you will see:
So the values for sigma and tau that
fitDistreturns are the log of the numbers you would plug intorSEP1. In other words, the correct way to runrSEP1from your model is:To show that this is the case, let us create a set of random numbers from a defined
rSEP1where sigma = 0.5, and tau = 0.5. If we then find the best-fitting distribution, we should get results close to log(0.5) for both sigma and tau. Sincelog(0.5)is about -0.69, we would expect values of about -0.69 for both these parameters:And plugging the exponents of sigma and tau into
dSEP1gives us a near-perfect fit to the density of our test vector:It's worth pointing out that the actual fit obtained for
mtcars$wtin the example is pretty terrible. The small value of sigma jyst means that most random values drawn from the distribution will be close to the mean ofmtcars$wt. There are only 32 points in the data set, which makes it very difficult to automatically fit a parametric distribution accurately.