I have 2 dataframes I want to merge on first name and contact email returning all of the values from df2.
Sample of data:
df1 = pd.DataFrame([['Elle', 'Kelly', '[email protected]', 2],
['Amanda','Johnson', '[email protected]', 5],
['Jay', 'Rogers', '[email protected]', 4],
['David','Connor', '[email protected]', 3],
['Rachel','Connor','[email protected]', 2],
['Anne','Smith','[email protected]', 4],
['Anne','Smith', '[email protected]', 2],
['Dani', 'Carter', '[email protected]', 3],
['Drake', 'Walker', '[email protected]', 2]],
columns = ['First Name', 'Last Name', 'Email', 'Rating'])
df2 = pd.DataFrame([[np.nan, np.nan, np.nan, 1040, 'City'],
['Dani','Carter-Hampton', '[email protected]', 1040, 'New York'],
['Anne','Smith','[email protected]', 1040, 'New York'],
['David', 'Connor', '[email protected]', 1040, 'Chicago'],
['Jay', 'Rogers','[email protected]', 1040, 'Los Angeles'],
['Anne','Smith', '[email protected]', 1040, 'Houston'],
['Amanda','Johnson','[email protected]', 1040, 'Los Angeles'],
['Rachel', 'Connor', '[email protected]', 1040, 'Chicago'],
['Elle', 'Moore-Kelly', '[email protected]', 1040, 'Los Angeles'],
['Drake', 'Walker', '[email protected]', 1040, 'Los Angeles']],
columns = ['First Name','Last Name','Contact Email','Movie Id','Location'])
merge
merged = df1.merge(df2, left_on = ['First Name', 'Email'], right_on = ['First Name', 'Contact Email'], how = 'right')
The problem is some of the rows are unable to match because of differences between name and email in both tables, so I want to perform another merge on first name and last name to fill in the missing nan values. I tried using the combine_first function but because the indices are different it results in mismatched rows and duplicates. How do I go about this?
Desired Output after the first merge to fill in na values


Building on @Lazyer's comment, I would use pandas method chaining and use
combine_firstto combine theLast Name_xandLast Name_ycolumns and also theEmailandContact Email.