I have 4 lists of companies names. Lets take a company Google. In List A, Google is written as Google Ltd, In 2nd list, it is written as Google Inc (extended etc), 3rd contain Beta Gogl (misspelled etc), 4th contain ABC Googl. I want to create embedding(vector index/vector store) for the all the names in the 4 lists.
When a new word (company name) comes in, i generate an embedding and find the closest match to it.
One approach is to not use embedding but create some edit distance(Levensthein etc) but then find the most similar one. The issue is if i have 1000s of names in each list, it will costs a lot of computation each time i want to similar one(lets say string matching is done 1000 times a day)
So i want to create some embeddings vector store so i can just find the similarity quickly.
GloVe can be the option but i am not sure if it works with names only,(works good on sentences).
Any other approach recommendation would also be great.