How to simulate from Gaussian trivariate distribution when X and Y are given?

195 Views Asked by At

Is there any easy python package solution for following problem?

I have got 3 variables, lets say they share trivariate normal distribution - so we know means and covariance matrix. Is there any simple way how to simulate Z values when X, Y is given?

I have already coded bivariate simulation by employing Gibbs sampler, it can be extended to sample from multivariate distribution alas it is little bit confusing and annoying. Really, is there is no easier way to sample from conditional distributions?

Thank you.

Here is what I tried, I believe it works. But I would rather to use some package directly, without own calculations:

import numpy as np
import scipy

class GibbsMultivariateSampler():
    def __init__(
            self, 
            data: np.ndarray,
            means: np.ndarray, 
            covariance: np.ndarray):

            self.means=means
            self.covariance=covariance
            self.data = data
                        
    def conditioned_mean(self,x: np.ndarray):
        i=0
        conditioned_means=[]
        for element in x:           
            i=i+1
            _h=len(x)
            _l = _h - i
            conditioned_mean=(self.covariance[_h, _l] / self.covariance[_l, _l]) * (element - self.means[_l])
            conditioned_means.append(conditioned_mean)
        return self.means[_h]+sum(conditioned_means)

    def conditioned_covariance(self,x: np.ndarray):
        i=0
        conditioned_covariances=[]
        for element in x:
            i=i+1
            _h=len(x)
            _l = _h - i
            conditioned_covariance=-1*((self.covariance[_h, _l] ** 2  / self.covariance[_l, _l]))
            conditioned_covariances.append(conditioned_covariance)
        return self.covariance[_h, _h]+ sum(conditioned_covariances)


    def simulate(self,size):
        self.conditioned_covariance(self.data)
        conditioned_distribution = scipy.stats.multivariate_normal(mean=self.conditioned_mean(self.data), cov=self.conditioned_covariance(self.data))
        return conditioned_distribution.rvs(size=size)

                
mean4 = np.array([2, 3, 4, 5])
cov_matrix4 = np.array([[1, 0.5, 0.3, 0.2],
                       [0.5, 1, 0.4, 0.1],
                       [0.3, 0.4, 1, 0.15],
                       [0.2, 0.1, 0.15, 1]])                


#simulate Z given X,Y
sampler=GibbsMultivariateSampler(data=np.array([10,20]), means=mean4, covariance=cov_matrix4)
simulation=sampler.simulate(1000)

#simulate W given X,Y,Z
sampler=GibbsMultivariateSampler(data=np.array([10,20,5]), means=mean4, covariance=cov_matrix4)
simulation=sampler.simulate(1000)

2

There are 2 best solutions below

0
pjs On

I'm operating in the belief that you want to know how to generate Gaussian random variables having the correct mean and variance/covariance structure, and that the focus on conditional increments was because you were using Gibbs sampling. You can avoid that by using Cholesky decomposition in the mathematical formulation described in this answer to a similar question.

A quick summary is that if M is the vector of means, V is the variance/covariance matrix, L is a lower triangular matrix such that LLt = V. The matrix L can be derived using Cholesky factorization. Given a vector Z of iid standard Gaussians, X = LZ+M yields a vector of Gaussians having the desired mean and variance/covariance structure. This is the matrix equivalent of the familiar univariate transformation X = σZ+μ with L acting in the role as σ, the square root of the variance.

The following is a python/numpy implementation of that algorithm. Note that even though your subject line said "trivariate", I've used the 4x4 covariance matrix and corresponding mean vector from your code.

import numpy as np
import random

M = np.array([2, 3, 4, 5])
V = np.array([[1, 0.5, 0.3, 0.2],
              [0.5, 1, 0.4, 0.1],
              [0.3, 0.4, 1, 0.15],
              [0.2, 0.1, 0.15, 1]])

print("Show the covariance matrix\n")
print(V)               

print("\nShow Cholesky factorization of covariance matrix\n")                
L = np.linalg.cholesky(V)
print(L)

print("\nConfirm that L L.T = V\n")
print(np.dot(L, L.T))

print("\nShow input vector of standard normals\n")
Z = np.random.default_rng().normal(size = 4)
print(Z)

print("\nSample of correlated Gaussian results\n")
print(np.dot(L, Z) + M)

If you actually do want to know how to generate Xk given X1,...,Xk-1, this approach can be inverted to algebraically determine the corresponding standard normals Z1,...,Zk-1, generating a value for Zk, multiplying the extended Z vector by the kth row of L, and adding Mk.

0
Víťa Horák On

I have tried this after reading your reaction:

import numpy as np
import random

def produce_Zscore(x, mean, std):
    return (x-mean)/std

M = np.array([2, 3, 4, 5])
V = np.array([[1, 0.5, 0.3, 0.2],
              [0.5, 1, 0.4, 0.1],
              [0.3, 0.4, 1, 0.15],
              [0.2, 0.1, 0.15, 1]])

              
L = np.linalg.cholesky(V)
X1= produce_Zscore(4,mean=M[0], std= np.sqrt(np.diag(V)[0]))
Y1=produce_Zscore(4,mean=M[1], std= np.sqrt(np.diag(V)[1]))

#add two random Z-Scores for Z1,W1
Z = [X1,Y1, np.random.randn(), np.random.randn() ]

#Alas second given value (Y1) is distorted after Cholesky decomposition 
print(np.dot(L, Z) + M)

# desired output:
# [4.0, 4.0, something random but consistent with X1 Y1, something random but consistent with X1, Y1, Z1 ]¨