The date & function is:
- 2 same DF namely
left_dfandright_df:
pd.DataFrame({'Column1': ['dummy_text1'],
'Column2': [''],
'Column3': [''],
'Column4': ['dummy_text2'],
'Column5': [''],
'Column6': ['Dummy Role 1']
})
- the function
left_df.merge(right_df,
on=['Column1', 'Column2', 'Column3', 'Column4', 'Column5', 'Column6'],
how='outer',
indicator=True)
The question is:
the result contains two rows, one as 'left_only' and one as 'right_only', instead of a single row with 'both' as indicator.
Other info.:
I have already tried the following steps to address the issue:
- Checked for leading/trailing whitespaces in the merge columns, but there are none.
- Verified that there are no duplicates in either DataFrame based on the merge columns.
- Used
drop_duplicates()withinplace=Trueandkeep=Falseon both DataFrames before the merge. - Ensured that the merge columns have the same data types in both DataFrames.
Edit:
Sharing code snippet: I forgot to mention this part, this is to resolve case of the values:
left_df['Column1_U'] = left_df['Column1'].copy()
left_df['Column6_U'] = left_df['Column6'].copy()
right_df['Column1_U'] = right_df['Column1'].copy()
right_df['Column6_U'] = right_df['Column6'].copy()
this is the rest of the code:
columns_to_exclude = ['Column1_U', 'Column6_U']
left_df = left_df.apply(lambda s: s.str.lower() if s.name not in columns_to_exclude else s)
right_df = right_df.apply(lambda s: s.str.lower() if s.name not in columns_to_exclude else s)
merged_df = merge(left = left_df
,right = right_df
,on=['Column1', 'Column2', 'Column3', 'Column4', 'Column5', 'Column6']
,how='outer'
,indicator=True)
but the output that i'm getting is this: Column1,Column2,Column3,Column4,Column5,Column6,Column1_U_x,Column6_U_x,Column1_U_y,Column6_U_y,_merge dummy_text1,,,dummy_text1,,dummy role 1,Dummy_text1,Dummy Role 1,,,left_only dummy_text1,,,dummy_text1,,dummy role 1,,,Dummy_text1,Dummy Role 1,right_only
It is solved. The issue turned out to be with
NULLvalue and'', both these dataframes were getting created from SQL tables and in the query I was usingISNULL(Column, '')as Column in one andNULLAS Column in the other. Turns out after doingdataframe.fillna(''), it was treating the '' differently between both dataframes, after I updated the queries the issue was resolved.