I'm trying to find sentences which has the word less or words similar to less. When I tried to find the token similarity with less and all the words in doc. I'm getting words like less, more, which are opposite to each other yet they have higher similarity.
I'm using "en-web-core-lg" pipeline.
The similarity function you are using bases its distance metric on context. Even though (or rather because) they are antonyms,
lessandmoreappear in very similar contexts and have similar syntactic relationships with other words, so their embeddings will be very similar. You can replacemorewithlessin many phrases and they will still make sense (even though the sense will be the opposite). This is called a paradigmatic relation.More exactly, in this case, they are both adjectives, referring to quantities.
For a concrete example, you might think that
tinywould be similar toless. But consider the following phrase:"There will be less rain tomorrow."
If you replace it with
tinyyou get "There will be tiny rain tomorrow" which makes less sense than "There will be more rain tomorrow".