How can I recover the likelihood of a certain word appearing in a given context from word embeddings?

160 Views Asked by Ryszard Tuora At 23 September 2019 at 09:40

I know that some methods of generating word embeddings (e.g. CBOW) are based on predicting the likelihood of a given word appearing in a given context. I'm working with polish language, which is sometimes ambiguous with respect to segmentation, e.g. 'Coś' can be either treated as one word, or two words which have been conjoined ('Co' + '-ś') depending on the context. What I want to do, is create a tokenizer which is context sensitive. Assuming that I have the vector representation of the preceding context, and all possible segmentations, could I somehow calculate, or approximate the likelihood of particular words appearing in this context?

Original Q&A

There are 1 best solutions below

Jindřich On 23 September 2019 at 10:52

This very much depends on the way how you got your embeddings. The CBOW model has two parameters the embedding matrix that is denoted v and the output projection matrix v'. If you want to recover the probabilities that are used in the CBOW model at training time, you need to get v' as well. See equation (2) in the word2vec paper. Tools for pre-computing word embeddings usually don't do that, so you would need to modify them yourself.

Anyway, if you want to compute a probability of a word, given a context, you should rather think about using a (neural) language model than a table of word embeddings. If you search the Internet, I am sure you will find something that suits your needs.

How can I recover the likelihood of a certain word appearing in a given context from word embeddings?

There are 1 best solutions below

Related Questions in NLP

Related Questions in WORD-EMBEDDING

Related Questions in WORD-SENSE-DISAMBIGUATION

Trending Questions

Popular # Hahtags

Popular Questions