Calculating conditional entropy bigrams using NLTK and Kneser-Ney smoothing

65 Views Asked by chasmani At 21 September 2023 at 15:35

I am trying to estimate the conditional entropy of a text source at the bigram level. In order to get a good estimate, I need estimates for the probabilities of bigrams. After doing some reading, it seems that Kneser-Ney smoothing is the most appropriate way to estimate these probabilities.

The nltk module includes Kneser-Ney smoothing, in the nltk.KneserNeyProbDist module. However, this only works with trigrams. According to this answer this doesn't work for bigrams (How to perform Kneser-Ney smoothing in NLTK at word-level for bigram language model?).

Anyone know how to do this? Maybe using nltk.lm.KneserNeyInterpolated. Or would it work if I added a dummy token to the beginning of all my bigrams and used the nltk.KneserNeyProbDist module? I'm finding the whole thing very confusing and unclear.

Original Q&A

Calculating conditional entropy bigrams using NLTK and Kneser-Ney smoothing

There are 0 best solutions below

Related Questions in PYTHON

Related Questions in NLP

Related Questions in NLTK

Related Questions in ENTROPY

Trending Questions

Popular # Hahtags

Popular Questions