I need to replace some names of the countries with the correct name.below is my dataframe
names country
0 1 Austria
1 2 Autrisa
2 3 Egnald
3 4 Sweden
4 5 Swweden
5 6 India
I need to replace the above countries with the right names.Below is the output i need
names country
0 1 Austria
1 2 Austria
2 3 England
3 4 Sweden
4 5 Sweden
5 6 India
correct_names = {'Austria','England','Sweden'}
def get_most_similar(word, wordlist):
top_similarity = 0.0
most_similar_word = word
for candidate in wordlist:
similarity = SequenceMatcher(None, word, candidate).ratio()
if similarity > top_similarity:
top_similarity = similarity
most_similar_word = candidate
# print(most_similar_word)
return most_similar_word
data['country'].apply(lambda x: get_most_similar(x,correct_names))
The output i am getting is below:-
0 Austria
1 Austria
2 England
3 Sweden
4 Sweden
5 England -- this should be India but it got converted to England
Needed help to fix this.
You assigned
But this is not appropriate for the current use case, as India can be a correct name but it does appear in that
set.You want to assign