Objective
Automate the process of finding the best fit distribution using gamlss package and generating random numbers from this distribution
Example
My actual data has several variables. So, I will use 2 variables from iris dataset in this example. Say I want to generate random numbers from best fit distribution on sepal length and petal length. I can do this as follows:
library(gamlss)
# Load data-------
data("iris")
# Define a function that finds the best fit distribution
find_dist <- function(x){
m1 <- fitDist(x, k = 2, type = "realAll", trace = FALSE, try.gamlss = TRUE)
m1
}
# Best fit distribution for Sepal.Length---------
dist_Sepal.Length <- find_dist(iris$Sepal.Length)
family_Sepal.Length <- dist_Sepal.Length$family[1] # "SEP4"
dist_Sepal.Length$Allpar
# eta.mu eta.sigma eta.nu eta.tau
# 5.8269404 0.3019834 1.8481415 0.8684266
dist_Sepal.Length$mu.link #identity
dist_Sepal.Length$sigma.link #log
dist_Sepal.Length$nu.link #log
dist_Sepal.Length$tau.link #log
## Generate a random number:
rSEP4(1, mu = 5.827, sigma = exp(0.302), nu = exp(1.848), tau = exp(0.8684))
# Best fit distribution for Petal.Length---------
dist_Petal.Length <- find_dist(iris$Petal.Length)
family_Petal.Length <- dist_Petal.Length$family[1] # ""SEP2"
dist_Petal.Length$Allpar
# eta.mu eta.sigma eta.nu eta.tau
# 4.248646 1.057717 -26.546283 3.594178
dist_Petal.Length$mu.link #identity
dist_Petal.Length$sigma.link #log
dist_Petal.Length$nu.link #identity
dist_Petal.Length$tau.link #log
## Generate a random number:
rSEP2(1, mu = 4.249, sigma = exp(1.058), nu = -26.546, tau = exp(3.594))
Challenges in Creating a Function to Automate Generating Random Numbers
I can extract the distribution from the family attribute and all parameter values from the Allpar attribute. The challenge is that each distribution has different parameters and link functions. Otherwise, I can directly provide Allpar to the random number function.
Please guide me how to automate this process?
This may not be elegant but achieves my goal to automate the random number generation after getting best fit distribution:
Here's the command to run to get a random number from
SEP4distribution (best fit foriris$Sepal.Length):Now, I can use it: