Given bigram probabilities for words in a text, how would one compute trigram probabilities?
For example, if we know that P(dog cat) = 0.3 and P(cat mouse) = 0.2
how do we find the probability of P(dog cat mouse)?
Thank you!
Given bigram probabilities for words in a text, how would one compute trigram probabilities?
For example, if we know that P(dog cat) = 0.3 and P(cat mouse) = 0.2
how do we find the probability of P(dog cat mouse)?
Thank you!
Copyright © 2021 Jogjafile Inc.
In the following I consider a trigram as three random variables
A,B,C. Sodog cat horsewould beA=dog, B=cat, C=horse.Using the chain rule:
P(A,B,C) = P(A,B) * P(C|A,B). Now your stuck if you want to stay exact.What you can do is assuming
Cis independent ofAgivenB. Then it holds thatP(C|A,B) = P(C|B). AndP(C|B) = P(C,B) / P(B), which you should be able to compute from your trigram frequencies. Note that in your caseP(C|B)should really be the probability ofCfollowing aB, so it's the probability of aBCdivided by the probability of aB*.So to sum it up, when using the conditional independence assumption:
And to compute
P(B*)you have to sum up the probabilities for all trigrams beginning withB.