I am trying to cluster data using cluster::daisy function and dissimilarity matrix. The data looks as shown below.
> head(score_mat_unique_3)
ID_1 ID_2 Score
1: 1000035849 1000532512 2.49e-60
2: 1000035849 1000682765 3.87e-08
3: 1000086994 1000658924 8.90e-18
4: 1000234640 1000535109 1.20e-87
5: 1000235015 1000754236 6.29e-34
6: 1000258002 1000598768 8.34e-36
Score shows how different the objects are (the larger the value, the more different objects.), so I use the dissimilarity matrix and the daisy function.
diss3 <- daisy(score_mat_unique_3, metric = "gower")
But when I try to plot hclust, some numbers are printed instead of ID.
fit3 <- hclust(diss3, method = "ward.D2")
plot(fit3)
Accordingly, information about objects in clusters is lost. How can I return information about the initial IDs and understand which IDs are in the clusters?
You just need to use the
labelsargument toplot. Since you don't provide your data, I will illustrate with the built-in iris data.In your case, try
plot(fit3, labels=score_mat_unique_3$ID_1)