Why token "less" has higher similarity with "more" in Spacy?

39 Views Asked by At

I'm trying to find sentences which has the word less or words similar to less. When I tried to find the token similarity with less and all the words in doc. I'm getting words like less, more, which are opposite to each other yet they have higher similarity.

I'm using "en-web-core-lg" pipeline.

1

There are 1 best solutions below

1
Andrei Miculiță On

The similarity function you are using bases its distance metric on context. Even though (or rather because) they are antonyms, less and more appear in very similar contexts and have similar syntactic relationships with other words, so their embeddings will be very similar. You can replace more with less in many phrases and they will still make sense (even though the sense will be the opposite). This is called a paradigmatic relation.

More exactly, in this case, they are both adjectives, referring to quantities.

For a concrete example, you might think that tiny would be similar to less. But consider the following phrase:

"There will be less rain tomorrow."

If you replace it with tiny you get "There will be tiny rain tomorrow" which makes less sense than "There will be more rain tomorrow".