Why does Cosine Similarity Score of Transformer-Based Model's Embeddings Always Lies Between 70 and 100

97 Views Asked by Sukesh Ram At 22 August 2023 at 13:52

I have a question in word embeddings using transformer encoder models. Let's create word embeddings using the BERT model.

Word 1: "cat" (em1)
Word 2: "dog" (em2)
Word 3: "driver" (em3)
Word 4: "lion" (em4)

Let's take the cosine similarity score: (below cosine scores are not real , just for the sake of an example)

cs(em1, em2) = 0.90
cs(em1, em3) = 0.70
cs(em1, em4) = 0.73

The cosine scores always lie between 70 and 1. They do not go below 70 (for example, the cosine similarity score of "cat" and "driver").

After applying PCA and reducing the dimension from 768 to 50 (reducing the variance in the vector), the score is now below 70.

My question is: high variance keeps a lot of information about the word, right? But the cosine score always lies between 70 and 100. Can anyone please tell me why this is happening? And why, when I reduce the dimension, the score goes below 70? This problem occurs not only with BERT, but with every transformer-based model.

Thanks in advance for your help!

Original Q&A

Why does Cosine Similarity Score of Transformer-Based Model's Embeddings Always Lies Between 70 and 100

There are 0 best solutions below

Related Questions in TRANSFORMER-MODEL

Related Questions in WORD-EMBEDDING

Related Questions in COSINE-SIMILARITY

Related Questions in VARIANCE

Related Questions in DIMENSIONALITY-REDUCTION

Trending Questions

Popular # Hahtags

Popular Questions