Good morning!,
I am trying to understand the Leiden algorithm and its usage to find partitions and clusterings. The example provided in the documentation already finds a partition directly, such as the following:
import leidenalg as la
import igraph as ig
G = ig.Graph.Famous('Zachary')
partition = la.find_partition(G, la.ModularityVertexPartition)
G.vs['cluster'] = partition.membership
ig.plot(partition,vertex_size = 30)
If one checks partition.membership, it already gets 4 clusters.
However, I am trying to do a similar thing with the iris dataset and the algorithm is not able to find clusters.
I have tried getting the X variables and create a:
- 1- correlation matrix or,
- pairwise distances,
but those do not work well (not even by scaling values), because it is not able to create clusters based on observations. I assume correlations are not good to separate them or pairwise distances. What am I not understanding well ?
here is the code for the correlation matrix:
import numpy as np
from sklearn import datasets
import igraph as ig
import leidenalg
import cairo
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import pairwise_distances
# Load the Iris dataset
iris = datasets.load_iris()
X = iris.data # Features
y = iris.target # Class labels
# Create an adjacency matrix based on observation similarity
# adj_matrix = abs(1-np.corrcoef(X))
adj_matrix = pairwise_distances(X)
print(adj_matrix)
# Create an igraph graph object
graph = ig.Graph.Weighted_Adjacency(adj_matrix)
# Apply the Leiden algorithm for community detection evaluating the nº of clusters created by changing the resolution parameter.
for i in np.arange(0.9,1.05,0.05):
partition = leidenalg.find_partition(graph, leidenalg.CPMVertexPartition,
resolution_parameter = i)
print(i,len(np.unique(partition.membership)) )
#0.9 1
#0.9500000000000001 1
#1.0 150
#1.0500000000000003 150
As one can see, once it gets to 1, there is 150 cluster (equally to the nº of observations), and before that, it considers everything 1 cluster. Let me know your ideas.
Thank you for you time
Make sure to pass in the weights to
find_partition. See the documentation for more detail.With correlations I highly recommend to use CPM, not Modularity.