finding top N frequent words in each cluster using R

61 Views Asked by Ainaa Duma At 11 October 2023 at 19:22

I have topic vectors as my dataset, where each row represents a different topic (described by different weights on all 441 words in the columns). I'm running it through k-means to see which topics are selected together. For each group algorithm selects,I want to see top words that appear across topics to see how the topics are similar.

Here is subset of the data

I have little to no knowledge of text analysis but this what I have so far:

topic <- read.csv("topicVecs.csv", stringAsFactors = FALSE)

library(ggpubr)
library(factoextra)
library(NbClust)

set.seed(123)
res.km <- kmeans(scale(topic[,-5]),5, nstart = 25)
res.km$cluster
fviz_cluster(res.km, data = topic[, -5],
   palette = c ("#2e9fdf", "#00afbb","#e7b800", "#fc0439", "#ff5733"),
   geom = "point",
   elipse.type = "convex",
   ggtheme = theme_bw()
)

k <- 5

I thought I could start from 5 clusters. Not sure how to go about it from here.

Original Q&A

finding top N frequent words in each cluster using R

There are 0 best solutions below

Related Questions in R

Related Questions in NLP

Related Questions in DATA-SCIENCE

Related Questions in DATA-ANALYSIS

Related Questions in TOPIC-MODELING

Trending Questions

Popular # Hahtags

Popular Questions