how to get the posterior probability of topics in LDA model using gensim?

100 Views Asked by At

I want to apply the "bayes factor" method in Bybee, Leland and Kelly, Bryan T. and Manela, Asaf and Xiu, Dacheng, Business News and Business Cycles, forthcoming in the Journal of Finance. Therefore, I would like to calculate the posterior probability of a LDA model with selected topic numbers, so that I can compare different models with various topic numbers. I try to use the "LdaState" in gensim, however, I failed to get the right parameters. Can anyone kindly tell me how to use the "LdaState"?

For example:

eta = lda0.eta
lamda = LdaState(eta, shape=((i, 10),)).get_lambda()
File "C:\Users\AppData\Roaming\JetBrains\PyCharmCE2023.2\scratches\scratch_10.py", line 71, in run
    lamda = LdaState(eta, shape=((i, 10),)).get_lambda()
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\User\PycharmProjects\pythonProject\venv\Lib\site-packages\gensim\models\ldamodel.py", line 174, in __init__
    self.sstats = np.zeros(shape, dtype=dtype)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: 'tuple' object cannot be interpreted as an integer

I appreciate the help of jejemalani, and I still need some detailed examples. Please see the comments.

1

There are 1 best solutions below

5
prohit On

Changing your code to the following should fix your problem, given i is an integer.

eta = lda0.eta
lamda = LdaState(eta, shape=(i, 10)).get_lambda()

Taken from the Gensim LDA Model Docs

classgensim.models.ldamodel.LdaState(eta, shape, dtype=<class 'numpy.float32'>)

Bases: SaveLoad

Parameters

  • eta (numpy.ndarray) – The prior probabilities assigned to each term.

  • shape (tuple of (int, int)) – Shape of the sufficient statistics: (number of topics to be found, number of terms in the vocabulary).

  • dtype (type) – Overrides the numpy array default types.

dtype is optional. Make sure the rest of the parameters you pass match the above.