I have two pandas dataframes. DF1 has columns A B and C. DF2 has columns B and C. I want to create DF3 with columns A B C D and E.
I want column A B and C to be the same and in the same order as column A B and C in DF1. Then I want column D to be the values in DF2 column B when DF2 B and C both match DF1 B and C. Then I want DF3 column E to be the values in DF2 column B when DF2 column B matches DF1 B but the column C amounts do not match.
To summarize:
Columns:
A - same as DF1 A
B - same as DF1 B
C - same as DF1 C
D - the values of DF2 B when both B and C match DF1.
E - the values of DF2B when only B matches DF1 and c does not.
Thanks in advance!
Here is an example:
Here is the same example in text:
| DF1: | ||
|---|---|---|
| A | B | C |
| AAA | 123 | 456 |
| BBB | 789 | 888 |
| CCC | 999 | 111 |
| DDD | 555 | 444 |
| DF2: | |
|---|---|
| B | C |
| 123 | 456 |
| 789 | 888 |
| 789 | 900 |
| 789 | 950 |
| 999 | 111 |
| 555 | 000 |
| 555 | 444 |
| DF3: | ||||
|---|---|---|---|---|
| A | B | C | D | E |
| AAA | 123 | 456 | 456 | nan |
| BBB | 789 | 888 | 888 | 900, 950 |
| CCC | 999 | 111 | 111 | nan |
| DDD | 555 | 444 | 444 | 000 |
I have tried to do a merge like the following: df1.merge(df2['C'].astype(str).groupby(df2['B']).agg(', '.join), on='B', how='left;).fillna('0') which has worked when I want to match on one column (for example B), and pull in the information in C. But I am not familiar with how to pull in multiple columns when one needs to match twice, and one needs to match only once, discluding any that match in both columns.
You can
mergeon "B", then assign "D"/"E" depending on the equality of the two "C" columns. Finally,pivot_tableto reshape your DataFrame:Output: