How to store the substring that produces the closest match instead of the entire string

51 Views Asked by At

I am fuzzy matching a list of names from a list of strings. I am then creating a dictionary that holds the matched names and the strings from which the match came from (see below). I used the following code:

for the notes_list data, see here https://drive.google.com/file/d/1_qyxgzOS4_k8n4KlJCiP9LAW8M8Fn-Hf/view?usp=sharing

import pandas as pd
from fuzzywuzzy import process
from fuzzywuzzy import fuzz
master_list=['Muhammad bin Rashid Al Maktum','Antonio Costa','Nikol Pashinyan','Antony Blinken','Mohammad Shtayyeh','Mohammad Shtayyeh','Sebastian Kurz','Kyriakos Mitsotakis','Volodymyr Zelenskyy','Sebastian Kurz']
notes_list=df.Notes.tolist() 
expanded_notes = []
expanded_notes_2 = []
expanded_notes_full = []
matched_names = []
matching_strings = []
threshold=86

for string in notes_list:
    s_list = string.strip("").split(';') # expand notes list by splitting at ;
    expanded_notes.extend(s_list)
for string in expanded_notes:
    s_list = string.strip("").split(',') # expand notes list further by splitting at ,
    expanded_notes_2.extend(s_list)
for string in expanded_notes_2:
    s_list = string.strip("").split(' and ') # expand notes list further by splitting at and
    expanded_notes_full.extend(s_list)
    
for string in expanded_notes_full:
    for name in master_list:
        if fuzz.partial_ratio(name, string) >= threshold:
            matched_names.append(name)
            matching_strings.append(string)
                                       
d= dict(zip(matched_names, matching_strings))
d={'Muhammad bin Rashid Al Maktum': 'Met with Prime Minister Muhammad bin Rashid Al Maktum',
 'Antonio Costa': 'Met with Prime Minister Antonio Costa',
 'Nikol Pashinyan': 'met with Prime Minister Nikol Pashinyan',
 'Antony Blinken': 'Secretary of State Antony Blinken',
 'Mohammad Shtayyeh': 'Met with Prime Minister Mohammad Shtayyeh',
 'Kyriakos Mitsotakis': 'Met with Prime Minister Kyriakos Mitsotakis',
 'Sebastian Kurz': 'Sebastian Kurz',
 'Volodymyr Zelenskyy': 'discussed fighter jets with President Volodymyr Zelensky'}

Using this dictionary, I would like to make another where instead of the full string in the values where the match came from, it simply contains the substring which produces the closest match to the names in the keys.

for example, instead of {'Volodymyr Zelenskyy': 'discussed fighter jets with President Volodymyr Zelensky'}

I am hoping to produce a dictionary with values like the following:

{'Volodymyr Zelenskyy': 'Volodymyr Zelensky'}

0

There are 0 best solutions below