I have a pandas data frame with string column which is a transaction string column. I am trying to some manual lemmatization. I have manually created a dictionary which has the main word as the key and a list of variations of the words as the values. I would like to substitute the words in the list with the main word.
here is the example code of the data I have.
import pandas as pd
list1 = ['0412 UBER TRIP HELP.UBER.COMCA',
'0410 UBER TRIP HELP.UBER.COMCA',
'MOBILE PURCHASE 0410 VALENCIA WHOLE FOODS SAN FRANCISCOCA',
'WHOLEFDS WBG#1 04/13 PURCHASE WHOLEFDS WBG#104 BROOKLYN NY',
'0414 LYFT *CITI BIKE BIK LYFT.COM CA',
'0421 WALGREENS.COM 877-250-5823 IL',
'0421 Rapha Racing PMT LLC XXX-XX72742 OR',
'0422 UBER EATS PAYMENT HELP.UBER.COMCA',
'0912 WHOLEFDS NOE 10379 SAN FRANCISCOCA',
'PURCHASE 1003 CAVIAR*JUNOON WWW.DOORDASH.CA']
df = pd.DataFrame(list1, columns = ['feature'])
map1 = {'payment':['pmts','pmnt','pmt','pmts','pyment','pymnts'],
'account':['acct'],
'pharmacy':['walgreens','walgreen','riteaid','cvs','pharm'],
'food_delivery':['uber eats','doordash','seamless','grubhub','caviar'],
'ride_share':['uber','lyft'],
'whole_foods':['wholefds','whole foods','whole food']
}
I know how to do it one word at a time using df['feature'].str.replace('variation','main word'). However, this is laborious and time consuming. Is there a faster way to do this? Thank you.
Reverse your map:
Output:
Details:
(?i): case insensitive\b...\b: word boundaryUpdate
If you don't care about the lower/upper case, you can use: