Hierarchical Clustering for binary variables

200 Views Asked by At

I have a dataset of 2,000 rows and 60 columns (dummies). The dummies are survey-like questions.

I'd like to apply hierarchical clustering to identify different types of profiles according to answers to questions.

I've heard about the hamming distance and jaccard distance : which one is better ? How about the linkage method ? Ward method only works with euclidean distance. Finally, how to choose the correct number of clusters ?

I've output a dendogram but doesn't know where to cut to choose the correct number of clusters.

I am expecting to retrieve around 5 clusters and be able to interpret them. I thought about first doing a simple logistic regression on each cluster. Then studying the shap values perhaps.

0

There are 0 best solutions below