At the view of document, I want to know the term probability of the topic for each document from the gensim LdaModel. And I got something like this
lda_model = LdaModel(corpus, id2word=dictionary, num_topics=50)
# phi relevance of the document 1
phi_doc1 = lda_model.get_document_topics(corpus[1],
minimum_probability=0.05, per_word_topics=True)[2]
phi_doc1
---
[(52, [(8, 19.999924)]),
(69, [(8, 666.9981)]),
(241, [(8, 30.999844)]),
(482, [(8, 0.9999151)]),
(593, [(8, 5.9999304)])]
but I couldn't understanding the meaning of the values.
I want to know the meaning of the phi relevance. I didn't understand after I read the help message
help(lda_model.get_document_topics)
--
" ...
Phi relevance values, multiplied by the feature length,
for each word-topic combination.
Each element in the list is a pair of a word's id and
a list of the phi values between this word and each topic..."
What is the meaning of the values : lda_model.get_document_topics(corpus[1], minimum_probability=0.05, per_word_topics=True)[2]
Is this "the term probability of the topic for each document" ?
My understanding is that the result you received means the following: list of word-ids and tuples of (topic number, phi value). What you wanted is document probabilities for each topic.
If your task is to get just the document probabilities, use
per_word_topics=Falseinget_document_topics(). This returns tuples of (topic, probability) for the document. More here: https://radimrehurek.com/gensim/models/ldamodel.htmlPhi values are relative measures of word distribution. They tell which word increases the probability of a document belonging to a topic (topic 8 in your case). Check out this: https://miningthedetails.com/LDA_Inference_Book/lda-inference.html