I have a dataset of 2,000 rows and 60 columns (dummies). The dummies are survey-like questions.
I'd like to apply hierarchical clustering to identify different types of profiles according to answers to questions.
I've heard about the hamming distance and jaccard distance : which one is better ? How about the linkage method ? Ward method only works with euclidean distance. Finally, how to choose the correct number of clusters ?
I've output a dendogram but doesn't know where to cut to choose the correct number of clusters.
I am expecting to retrieve around 5 clusters and be able to interpret them. I thought about first doing a simple logistic regression on each cluster. Then studying the shap values perhaps.