I have topic vectors as my dataset, where each row represents a different topic (described by different weights on all 441 words in the columns). I'm running it through k-means to see which topics are selected together. For each group algorithm selects,I want to see top words that appear across topics to see how the topics are similar.
I have little to no knowledge of text analysis but this what I have so far:
topic <- read.csv("topicVecs.csv", stringAsFactors = FALSE)
library(ggpubr)
library(factoextra)
library(NbClust)
set.seed(123)
res.km <- kmeans(scale(topic[,-5]),5, nstart = 25)
res.km$cluster
fviz_cluster(res.km, data = topic[, -5],
palette = c ("#2e9fdf", "#00afbb","#e7b800", "#fc0439", "#ff5733"),
geom = "point",
elipse.type = "convex",
ggtheme = theme_bw()
)
k <- 5
I thought I could start from 5 clusters. Not sure how to go about it from here.