I would like to use python to convert all synonyms and plural forms of words to the base version of the word.
e.g. Babies would become baby and so would infant and infants.
I tried creating a naive version of plural to root code but it has the issue that it doesn't always function correctly and can't detect a large amount of cases.
contents = ["buying", "stalls", "responsibilities"]
for token in contents:
if token.endswith("ies"):
token = token.replace('ies','y')
elif token.endswith('s'):
token = token[:-1]
elif token.endswith("ed"):
token = token[:-2]
elif token.endswith("ing"):
token = token[:-3]
print(contents)
I have not used this library before, so that this with a grain of salt. However, NodeBox Linguistics seems to be a reasonable set of scripts that will do exactly what you are looking for if you are on MacOS. Check the link here: https://www.nodebox.net/code/index.php/Linguistics
Based on their documentation, it looks like you will be able to use lines like so:
In addition to the example above, another to consider is a natural language processing library like
NLTK. The reason why I recommend using an external library is because English has a lot of exceptions. As mentioned in my comment, consider words like: class, fling, red, geese, etc., which would trip up the rules that was mentioned in the original question.