Replacing incorrect names with the right names using python similarity match

81 Views Asked by At

I need to replace some names of the countries with the correct name.below is my dataframe

names   country
0   1   Austria
1   2   Autrisa
2   3   Egnald
3   4   Sweden
4   5   Swweden
5   6   India

I need to replace the above countries with the right names.Below is the output i need

names   country
0   1   Austria
1   2   Austria
2   3   England
3   4   Sweden
4   5   Sweden
5   6   India
correct_names = {'Austria','England','Sweden'}
def get_most_similar(word, wordlist):
    top_similarity = 0.0
    most_similar_word = word  
    for candidate in wordlist:
        similarity = SequenceMatcher(None, word, candidate).ratio()
        if similarity > top_similarity:
            top_similarity = similarity
            most_similar_word = candidate
            # print(most_similar_word)

    return most_similar_word

data['country'].apply(lambda x: get_most_similar(x,correct_names))

The output i am getting is below:-

0    Austria
1    Austria
2    England
3     Sweden
4     Sweden
5    England  -- this should be India but it got converted to England

Needed help to fix this.

1

There are 1 best solutions below

5
J_H On

You assigned

correct_names = {'Austria', 'England', 'Sweden'}

But this is not appropriate for the current use case, as India can be a correct name but it does appear in that set.

You want to assign

correct_names = {'Austria', 'England', 'India', 'Sweden'}