finding top N frequent words in each cluster using R

61 Views Asked by At

I have topic vectors as my dataset, where each row represents a different topic (described by different weights on all 441 words in the columns). I'm running it through k-means to see which topics are selected together. For each group algorithm selects,I want to see top words that appear across topics to see how the topics are similar.

Here is subset of the data

I have little to no knowledge of text analysis but this what I have so far:

topic <- read.csv("topicVecs.csv", stringAsFactors = FALSE)

library(ggpubr)
library(factoextra)
library(NbClust)

set.seed(123)
res.km <- kmeans(scale(topic[,-5]),5, nstart = 25)
res.km$cluster
fviz_cluster(res.km, data = topic[, -5],
   palette = c ("#2e9fdf", "#00afbb","#e7b800", "#fc0439", "#ff5733"),
   geom = "point",
   elipse.type = "convex",
   ggtheme = theme_bw()
)

k <- 5 

I thought I could start from 5 clusters. Not sure how to go about it from here.

0

There are 0 best solutions below