Inconsistent Clustering from Scikit Learn Gaussian Mixture Model

85 Views Asked by At

I used GMM from Scikit Learn package for clustering. The python code is here.

import pandas as pd
from numpy import unique
from numpy import where
from sklearn.mixture import GaussianMixture
from matplotlib import pyplot

#load data
rawData=pd.read_excel('ClusteringFailure.xlsx',0)
X=rawData.iloc[:, :].to_numpy(dtype='float64')

#define model and set number of clusters to 4 for genotyping
model = GaussianMixture(n_components=4)

#fit the model
model.fit(X)

#assign a cluster index to each data point
yCluster = model.predict(X)
clusters = unique(yCluster)
for cluster in clusters:
    row_ix = where(yCluster == cluster)
    pyplot.scatter(X[row_ix, 0], X[row_ix, 1])
Here is the data I used. 
x   y
18.586  46.33
0.109   68.534
0.074   5.242
22.212  63.888
3.726   36.767
0.159   6.98
24.531  9.925
0.143   0.299
29.91   54.539
29.868  12.522
0.064   2.6
29.978  48.665

I ran it multiple times and every time the clustering was different. Can anyone explain why it is not consistent and advise on how to improve the consistency? Thanks!

0

There are 0 best solutions below