Question: How can I compute and code the frequency of words in each topic? My goal is to create 'Word Cloud' from each topic.
P.S.> I have no problem with wordcloud.
From the code,
burnin <- 4000 #We do not collect this.
iter <- 4000
thin <- 500
seed <-list(2017,5,63,100001,765)
nstart <- 5
best <- TRUE
#Number of topics:
k <- 4
LDA_results <-LDA(DTM,k, method="Gibbs", control=list(nstart=nstart,
seed = seed, best=best,
burnin = burnin, iter = iter, thin=thin))
Thank you (I try to make the question as concise as possible, so if you need further details, I can add more.)
If you want to create a wordcloud for each topic, what you want are the top terms for each topic, i.e., the most probable words to be generated from each topic. This probability is called
beta; it's the per-topic-per-word probability. The higher this probability beta is, the higher the probability that that word is generated from that topic.You can get out the
betaprobabilities in a tidy data frame from your LDA topic model usingtidyfrom tidytext. Let's look at an example dataset and fit a model using just two topics.You've fit the model now! Now, we can get out the probabilities.
They are all mixed up there. Let's use dplyr to get the top most probable terms for each of the topics.
You can now use this to make a wordcloud (with some reshaping). The
betaprobability is what you want to correspond to how big the words are.