I have the written the following code in R with an aim to generate correlated random variables which follow the beta distribution
#objective is to generated correlated
#beta distributed data
library(MASS)
library(faux)
generate.beta.parameters <- function(x, v) {
alpha = ((1 - x) / v - 1/ x) * x^2
beta = alpha * (1 / x - 1)
return (c(alpha, beta))
}
x1 <- 0.896
v1 <- 0.001
x2 <- 0.206
v2 <- 0.004
b1 <- generate.beta.parameters(x1, v1)
b2 <- generate.beta.parameters(x2, v2)
alpha1 <- b1[1]
beta1 <- b1[2]
alpha2 <- b2[1]
beta2 <- b2[2]
#create mean vector
mu = c(0, 0)
#create variance covariance matrix
sigma <- rbind(c(1, 0.2), c(0.2, 1))
#generate 1000 random numbers
df <- mvrnorm(n = 1000, mu = mu, Sigma = sigma)
df.beta <- matrix(nrow = nrow(df), ncol = ncol(df))
#normal to uniform
df.beta[,1] = norm2beta(df[,1], alpha1, beta1)
df.beta[,2] = norm2beta(df[,2], alpha2, beta2)
df.beta <- as.data.frame(df.beta)
cor(df.beta)
However, on running this code, the correlation outputs are not the same as expected. Here is the output on my machine:
V1 V2
V1 1.0000000 0.1549214
V2 0.1549214 1.0000000
Could you throw some light on this?
I do have this link, where random numbers have been generated, but it is in a different software i.e. SAS.
You have two problems. The first is that this is a random sample, so you don't expect the correlation to come out exactly equal to the specified value. Let's try running 500 samples and see what distribution of correlations we get:
The value of 0.155 that you got is toward the lower end of this distribution, but not extreme. Overall the correlation might be biased slightly low - the mean is 0.196 - but it looks OK. If you need the simulated correlation to be exactly 0.2 for some reason, that will be more challenging - you can set
empirical=TRUEinmvrnormto get means and variances that exactly match the specified values, but that leads us to the second problem.Feeding the multivariate normal through a nonlinear transformation will change the correlation - not very much in this case (because the distribution of correlations is still centered close to 0.2), but some. This is a mathematical problem, not a programming problem. You've essentially reimplemented a Gaussian copula model; you can see in the linked example that the correlations aren't the same as those from the original multivariate normal distribution.
One method for simulating a bivariate distribution with marginal distributions that are Beta and a specified correlation is described in Minhajuddin et al 2004 (section 3), but it's significantly more challenging than the approach you've tried here.
The quick-and-dirty way to handle this would be tweak the correlation parameter of the MVN to get the desired correlation of the transformed bivariate Beta ...
Minhajuddin, Abu T. M., Ian R. Harris, and William R. Schucany. 2004. “Simulating Multivariate Distributions with Specific Correlations.” Journal of Statistical Computation and Simulation 74 (8): 599–607. https://doi.org/10.1080/00949650310001626161.