How do I figure out which word sounds most similar to a given word?

51 Views Asked by At

so I have been using Levenshtein distance to calculate the difference between my words and a generated new word, like so: -1*np.array([distance.levenshtein(word_use,w2) for w2 in words])

Then I do other stuff with the result. I wish to produce a similar result, but for the way the words sound. This is proving really difficult.

ChatGPT advised me about jellyfish's match_rating_index, but this only accepts one string and I don't really see the utility of it for my purposes since it just outputs some weird string on its own, and then to use "ord" on it, and then I guess do the numerical comparison. I don't see this working the way I want it to.

Any humans have any better ideas? I am fully aware of the "different regions pronounce things differently" problem. This is irrelevant to my purpose.

EDIT: I've seen soundex: How to get the similar-sounding words together

But it just outputs codes. If I have a hundred words in a list, and I need to figure out which one a new word sounds most like, and we exclude the simple case of "these two output the same code," then soundex looks pretty useless for my case.

EDIT 2: The link (frankly, lazily) given to close my question doesn't help me. Levenshtein of soundex codes is absolutely no better than Levenshtein of the words themselves. It just adds a level of arbitrary complexity, and further, the way Levenshtein works to compare strings means that codes, despite being say T000->D263 and M116->D263 yield 4 and 4, respectively, which does not help me - this is very arbitrary closeness and doesn't take into account sound. Levenshtein distance between phonemic representations would never work for me because I care that /s/ and /z/ are closer to each other than they are to /k/.

0

There are 0 best solutions below