How to decide on the clustering method for categorical data in R?

179 Views Asked by Irau At 10 October 2019 at 10:33

I'm trying to perform a cluster analysis on mixed data (demographics variables + Likert scales from 1 to 10 preferences). I am trying to apply hierarchical clustering with the function daisy() for mixed data, but when i compute the goodness of fit - cophenetic correlation, the score is 0.60 which is not very high.

How can i improve the goodness of fit? Is hierarchical method suitable for this data? Should the Likert scale data be treated as factors or as numeric? Also, when calling - hclust(seg.dist, method="complete"), is this method suitable for my data?

I also tried Latent Class Analysis but the results are not interesting (unless I was doing it wrong)

seg.dist <- daisy(EUR_data)
as.matrix(seg.dist)
seg.hc <- hclust(seg.dist, method="complete")

to calculate the cophenetic correlation:

cor(cophenetic(seg.hc), seg.dist)

Original Q&A

There are 1 best solutions below

Has QUIT--Anony-Mousse On 12 October 2019 at 11:19

Improve preprocessing of your data.

Some attributes will be more important than others.

Likert attributes also often cannot be treated as interval scale, because people are less likely to give a 7 than a 6 or 8 because of cultural reasons: 7 is bad luck.

Clustering will only be as good as your distance, so improve your preprocessing and distance computations!

How to decide on the clustering method for categorical data in R?

There are 1 best solutions below

Related Questions in CLUSTER-ANALYSIS

Related Questions in CATEGORICAL-DATA

Related Questions in HIERARCHICAL-CLUSTERING

Related Questions in LIKERT

Related Questions in R-DAISY

Trending Questions

Popular # Hahtags

Popular Questions