How to choose optimal k for clustering of mixed variables?

33 Views Asked by At

I've been utilizing Gower distance for clustering mixed variables, encompassing both numerical and categorical variables, through hierarchical clustering. Apart from determining k using the dendrogram, is there a method to find the optimal k using within-cluster sum of squares (WSS)?

I've employed the 'pam' function to identify the optimal k with average silhouette width, but the value keeps increasing. Are there other functions that can utilize a dissimilarity matrix to calculate within-cluster sum of squares (WSS)?

the code of the 'pam' function

sil_width <- c() 
for (i in 2:20) {  
      sil_width[i] <- pam(gower_distance, diss = TRUE, k = i)$silinfo$avg.width
}
0

There are 0 best solutions below