I have parcels with the type of crop culture for each year for 5 years. I want to create a clustering of my parcels which would indicate the type of crop rotation based on the number of occurences of each crop type during the 5 years on a plot.
My dataframe looks like this:
data <- data.frame(
ID = 1:100, # Adding a column for IDs
matrix(sample(0:5, 100 * 15, replace = TRUE), ncol = 15)
)
I used this method to do the clustering (based on inertia, I know that my clustering can discriminate itself into 8 clusters):
clust = stats::hclust(vegan::vegdist(data %>% select(X1:X15)), method = "ward.D2")
clust$labels = data$ID
groups = stats::cutree(clust, 8)
data$cluster = as.factor(groups)
names_clust = data.frame(cluster = c(1:8), nom_cluster = c("Maize Rotation", "Beet low / diverse rotation", "Beet main rotation", "Legumes dominant", "Permanent grassland type", "Grassland dominant", "Fallow", "Fruit trees / Vegetables"))
data = data %>%
merge(names_clust, by = "cluster", all.x = T)
The clustering is always accurate, the corresponding parcels go well together, but the corresponding number for each cluster isn't consistent everytime I run the code, which does give me a problem when naming the clusters by their "character" name.
I tried the set.seed() function thinking the group numbering was random, but it does not have appeared to work.