How to get Mean and Covariance value from pomegranate Gaussian Mixture model

1.2k Views Asked by At

In the scikit learn Gaussian mixture model we can get mean and covariance by

clf = GaussianMixture(n_components=num_clusters, covariance_type="tied", init_params='kmeans')
for i in range(clf.n_components):
    cov=clf.covariances_[i]
    mean=clf.means_[i]

But in the case of pomegranate Gaussian Mixture model says no attributes called 'covariances_' and 'means_' Thank you very much for your valuable time.

1

There are 1 best solutions below

3
StupidWolf On BEST ANSWER

When you run covariance_type="tied", the model assumes a common covariance matrix for all components, so the code above does not hold. If covariance_type="tied" then it will be 1 covariance matrix under clf.covariances_ . Refer to help page:

‘full’ each component has its own general covariance matrix

‘tied’ all components share the same general covariance matrix

With pomegranate it estimates a covariance matrix for each component, so a good comparison with running GaussianMixture from sklearn with covariance_type="full"

from sklearn import datasets
from sklearn.mixture import GaussianMixture

iris = datasets.load_iris()

clf = GaussianMixture(n_components=3, covariance_type="full", init_params='kmeans')
clf.fit(iris.data)
cov = []
means = []
for i in range(clf.n_components):
    cov.append(clf.covariances_[i])
    means.append(clf.means_[i])

So for component or cluster 0 :

means[0]

array([5.006, 3.428, 1.462, 0.246])

cov[0]

array([[0.121765, 0.097232, 0.016028, 0.010124],
       [0.097232, 0.140817, 0.011464, 0.009112],
       [0.016028, 0.011464, 0.029557, 0.005948],
       [0.010124, 0.009112, 0.005948, 0.010885]])

Now using pomegranate:

from pomegranate import GeneralMixtureModel, MultivariateGaussianDistribution

mdl = GeneralMixtureModel.from_samples(MultivariateGaussianDistribution,
                                       n_components=3, X=iris.data)
mdl = mdl.fit(iris.data)

The parameters can be accessed under distributions, and you have a list as long as your components. For the first, you do distributions[0], second distributions[1] and so on:

mdl.distributions[0].parameters[0]

[5.005999999999999, 3.4280000000000004, 1.462, 0.24599999999999986]

np.round(mdl.distributions[0].parameters[1],6)

array([[0.121764, 0.097232, 0.016028, 0.010124],
       [0.097232, 0.140816, 0.011464, 0.009112],
       [0.016028, 0.011464, 0.029556, 0.005948],
       [0.010124, 0.009112, 0.005948, 0.010884]])