Cutree and rect.hclust groups

12 Views Asked by At

I have run a simple cluster analysis on a data set of 15 objects:

dmax.average<-hclust(hoards.dmax,method="average")

I then run cutree to give me five groups:

gps<-cutree(dmax.average,k=5)

which gives me a vector of group memberships:

act bds del dun ger gur mad mas me1 mor sgi smc sto sur tds 1 1 1 2 3 2 3 4 5 5 4 2 4 5 3

I have created a vector of colours:

mycols<-c("black","red","green","purple","blue")

I can plot the dendrogram with five coloured boxes:

plot(dmax.average,sub='',xlab='', main="Kolmogorov-Smirnov distance") rect.hclust(dmax.average,border=mycols,k=5)

enter image description here

The problem arises when I want to use the group numbers from cutree to colour points in other analyses (such as CA or PCO). rect.hclust colours the clusters from left to right, so that cluster 5 (from cutree) is coloured green and so on. I had thought that using clusters=gps in rect.hclust might help, but it makes no difference whatsoever to how they are coloured.

enter image description here

With this simple analysis, I could just reorder the colours in the vector but that is a complete fudge. If I was undertaking a real analysis on a much larger data set the chances of error are very high.

Is there some way of getting the output from cutree and rect.hclust to match?

Thanks.

#run the clustering
dmax.average<-hclust(hoards.dmax,method="average")
#cut the tree and check output
gps<-cutree(dmax.average,k=5)
gps
act bds del dun ger gur mad mas me1 mor sgi smc sto sur tds 
  1   1   1   2   3   2   3   4   5   5   4   2   4   5   3 
#set up vector of colours
mycols<-c("black","red","green","purple","blue")
#plot
plot(dmax.average,sub='',xlab='',
main="Kolmogorov-Smirnov distance")
rect.hclust(dmax.average,border=mycols,k=5)

#run a PCO
dmax.pco<-cmdscale(hoards.dmax)

#plot results
plot(dmax.pco,pch=16,xlab="first axis",ylab="second axis",
main="PCO of Dmax-based results",col=mycols[gps])
legend("topright",legend=c("Group 1","Group 2","Group 3","Group 4","Group 5"),
pch=16,col=mycols,bty='n')

#colours on the PCO scattergram do not match because group 5 from cutree is plotted in the middle of the dendrogram and is thus green on the dendrogram and blue on the scattergram.

See above. Tried cluster=gps in the rect.hclust command but makes no difference. I would like to get the groups from rect.hclust to match those in cutree, or be able to export the cluster memberships from rect.hclust so I can use them elsewhere.

0

There are 0 best solutions below