Use z scores and correlation to simulate oberservations in r

72 Views Asked by At

how can I use rnorm_multiI() to simulate 100 observations from 3 sets of zscores (a, b and c) which all correlated with each other at 0.25?

2

There are 2 best solutions below

0
jay.sf On

Here's a way using MASS::mvrnorm, don't know rnorm_multi(). First, set up parameters, then use them in mvrnorm.

## parameters
s <- c(.5, 1, 2)  ## define sds of z1-3
r <- .25  ## corr.

## define covariance matrix
Sigma <- matrix(c(
  s[1]^2,    r/2,     r,
     r/2, s[2]^2,   2*r,
       r,    2*r, s[3]^2
), ncol=3, nrow=3)


## simulation
n <- 100
set.seed(42)

library(MASS)
M <- mvrnorm(n=n, numeric(3), Sigma, empirical=TRUE) |> `colnames<-`(letters[1:3])

Result

head(M)
#                a          b         c
# [1,] -0.50980732  0.8981857  4.078068
# [2,] -0.58623896  0.2514551 -1.370179
# [3,]  0.71829541  0.2693594 -0.985852
# [4,] -0.19684349  2.2982353 -2.238448
# [5,]  0.08263476 -0.5594775  2.335499
# [6,] -0.35133993 -0.4720599  1.282973

matrixStats::colSds(M)
# [1] 0.5 1.0 2.0

colMeans(M)
#            a             b             c 
# 8.326673e-18  1.748601e-17 -2.872702e-17 
## i.e. zero

cor(M)
#      a    b    c
# a 1.00 0.25 0.25
# b 0.25 1.00 0.25
# c 0.25 0.25 1.00

Note: The empirical=TRUE flag forces the result to have the exact Sigma given, i.e. sample mean. You might want to set it to FALSE to simulate sampling from a population with given Sigma. Apart from that, it is also helpful to check if the parameters were specified correctly.

0
Giulio Centorame On
library(faux)

set.seed(96)

rnorm_multi(vars = 3, r = 0.25, varnames = c("a", "b", "c"))

You can get the same correlation between every pair of variables by specifying a single value. More on it in the vignette.