How to work with a dictionary of synonyms correctly?

146 Views Asked by At

I have a dictionary of synonyms of this type:

{"green": ["emerald", "herbaceous", "pistachio", "mint", "menthol", "malachite", "jade"]}

I am creating a preprocessor that in the text will have to match words with dictionary values and replace them with dictionary keys. That is, if the text comes across, for example, "emerald", then it should be replaced by "green". But the problem is that the dictionary is large, and the preprocessor will have to go through all the values of the dictionary to find a key for each word in the text. Is such enumeration of dictionary values the correct approach when working with synonyms? Or can something better be done?

I decided to try to decompose the values of dictionaries but new dictionaries, that is, like this:

{"emerald": "green",
 "herbaceous": "green",
 "pistachio": "green",
 "mint": "green",
 "menthol": "green",
 "jade": "green",
 "malachite": "green"}

But I think my solution is not quite correct. Please tell me ideas on how to properly organize work with a dictionary of synonyms?

1

There are 1 best solutions below

0
Tomáš Hořovský On

The first approach is indeed going to be really slow. The second approach you mentioned is pretty good, but it can be optimized a little bit further. There is a lot of data duplication in the dictionary. I would recommend having another list that holds all of the values and the dictionary would point to an index in the list.

Such as:

correct_words = ["green", "comfortable", ...]
synonyms = {
    "emerald": 0,
    "herbaceous": 0,
    "pistachio": 0,
    "mint": 0,
    "menthol": 0,
    "jade": 0,
    "malachite": 0,
    "cozy": 1,
    ...
}

def get_correct_synonym(word: str) -> str | None:
    if word not in synonyms:
        return None
    return correct_words[synonyms[word]]