I want to perform a cluster analysis with the pam function in R, using daisy to create a dissimilarity matrix. My data contains 2 columns (ID and Disease). Both are factors with a lot of values (400 and 1800 respectively). How can I create the dissimilarity matrix I need to cluster the data using pam?
Example data frame:
set.seed(1)
df <- data.frame(ID = rep(sample(c("a","b","c","d","e","f","g"),10,replace = TRUE),70),
disease = sample(c("flu","headache","pain","inflammation","depression","infection","chest pain"),100,replace = TRUE))
df <- unique(df)
Can I run the daisy function on this data frame or do I have to convert it into another format?
Since "Dissimilarities will be computed between the rows of x" (
?daisy), you may want to rundaisyon thetableof your data frame.