cluster::daisy drops labels

215 Views Asked by At

I am trying to cluster data using cluster::daisy function and dissimilarity matrix. The data looks as shown below.

> head(score_mat_unique_3)
         ID_1       ID_2    Score
1: 1000035849 1000532512 2.49e-60
2: 1000035849 1000682765 3.87e-08
3: 1000086994 1000658924 8.90e-18
4: 1000234640 1000535109 1.20e-87
5: 1000235015 1000754236 6.29e-34
6: 1000258002 1000598768 8.34e-36

Score shows how different the objects are (the larger the value, the more different objects.), so I use the dissimilarity matrix and the daisy function.

diss3 <- daisy(score_mat_unique_3, metric = "gower")

But when I try to plot hclust, some numbers are printed instead of ID.

fit3 <- hclust(diss3, method = "ward.D2")
plot(fit3)

enter image description here Accordingly, information about objects in clusters is lost. How can I return information about the initial IDs and understand which IDs are in the clusters?

1

There are 1 best solutions below

1
G5W On

You just need to use the labels argument to plot. Since you don't provide your data, I will illustrate with the built-in iris data.

DAT = iris[sample(26),1:4]
fit3 = hclust(dist(DAT))

plot(fit3, labels=LETTERS)

Dendrogram with labels

In your case, try plot(fit3, labels=score_mat_unique_3$ID_1)