Inconsistent result of clustering number order (R)

47 Views Asked by At

I have parcels with the type of crop culture for each year for 5 years. I want to create a clustering of my parcels which would indicate the type of crop rotation based on the number of occurences of each crop type during the 5 years on a plot.

My dataframe looks like this:

data <- data.frame(
    ID = 1:100,  # Adding a column for IDs
    matrix(sample(0:5, 100 * 15, replace = TRUE), ncol = 15)
)

I used this method to do the clustering (based on inertia, I know that my clustering can discriminate itself into 8 clusters):

clust = stats::hclust(vegan::vegdist(data %>% select(X1:X15)), method = "ward.D2")
clust$labels = data$ID

groups = stats::cutree(clust, 8)
data$cluster = as.factor(groups)

names_clust = data.frame(cluster = c(1:8), nom_cluster = c("Maize Rotation", "Beet low / diverse rotation", "Beet main rotation", "Legumes dominant", "Permanent grassland type", "Grassland dominant", "Fallow", "Fruit trees / Vegetables"))

data = data %>% 
  merge(names_clust, by = "cluster", all.x = T)

The clustering is always accurate, the corresponding parcels go well together, but the corresponding number for each cluster isn't consistent everytime I run the code, which does give me a problem when naming the clusters by their "character" name.

I tried the set.seed() function thinking the group numbering was random, but it does not have appeared to work.

0

There are 0 best solutions below