How to evaluate BTM topic modelling in R when changing K, the number of topics to be detected?

142 Views Asked by At

I have a set of titles and I need to cluster them in their semantic space.

Hello,

I am using library(BTM) to cluster a blog's titles into a semantic group. With BTM implemented in R, it is easy to do that after the usual NLP pipeline, such as tokenisation and lemmatisation. The problem I have is determining a sort of optimal K. I know there is no optimality as such, but having criteria to determine K will help a second phase where a visual inspection is implemented. The issue with BTM in R is that there are no evaluation measures implemented in the package. For example, Perplexity or Topic Coherence.

Is there someone who can suggest an implementation of perplexity or Topic Coherence, that could work with the output of:

model  <- BTM(
  df_title_tok,
  k = n_topics,
  alpha = 50/n_topics,
  beta = 0.01,
  iter = 1000,
  window = 15,
  background = FALSE,
  trace = FALSE,
  detailed = TRUE
)

I am not sure that implementing perplexity from scratch will help me since I am not 100% confident about the methodology, and even if I do implement it from scratch, I would not have a figure against which I can test my implementation.

Many thanks, Valerio

0

There are 0 best solutions below