pandas: assign a column values by slice with method chaining

77 Views Asked by At

In the following toy example, i'm trying to add a status column based on the outer merge results. The challenge is to preserve the chaining method as best described in tom's blog. The commented out line is my attempt at it but it does not work

import pandas as pd

# Create sample data frames A and B
A = pd.DataFrame({
    'key': ['A', 'B', 'C', 'D'],
    'value': [1, 2, 3, 4]
})

B = pd.DataFrame({
    'key': ['C', 'D', 'E', 'F'],
    'value': [3, 4, 5, 6]
})

# Merge data frames A and B on the 'key' column and add an indicator column
merged = pd.merge(A, B, on='key', how='outer', indicator=True)

# add a status column
#{'both':'no change',
 #'left_only': 'added',
 #'right_only': 'removed'}

merged = (merged
          .assign (status = 'no change')
          #.assign(status = lambda x: x.loc[x._merge == 'left_only'], 'added')
          .drop('_merge', axis=1)
          )
3

There are 3 best solutions below

0
sammywemmy On BEST ANSWER

something like this should suffice - generally for the slice since you are assigning, you need to use a conditional (map, np.where, np.select, pd.where etc)

(A
.merge(B, on='key', how='outer', indicator=True)
.assign(status = lambda f: f._merge.map({"left_only":"added", 
                                          "both":"no change", 
                                          "right_only":"removed"}))
)
0
taller On

Adding DataFrame.apply to get the status.

merged = (merged
          .assign(status = merged.apply(lambda x: 
                  'added' if x._merge == "left_only" else "", axis=1))
          .drop('_merge', axis=1)
          )
  key  value_x  value_y status                                                                
0   A      1.0      NaN  added                                                                
1   B      2.0      NaN  added                                                                
2   C      3.0      3.0                                                                       
3   D      4.0      4.0                                                                       
4   E      NaN      5.0                                                                       
5   F      NaN      6.0   
0
Scott Boston On

Here's a way to do it all in one-line using the 'walrus' operator, :=, map with a pre-defined dictionary, and changing the indicator column name in merge to a string:

import pandas as pd

# Create sample data frames A and B
A = pd.DataFrame({
    'key': ['A', 'B', 'C', 'D'],
    'value': [1, 2, 3, 4]
})

B = pd.DataFrame({
    'key': ['C', 'D', 'E', 'F'],
    'value': [3, 4, 5, 6]
})

d = {'both':'no_change',
     'left_only':'added',
     'right_only':'removed'}

merged = (merged_out:=pd.merge(A, B, on='key', how='outer', indicator='status'))\
            .assign(status=merged_out['status'].map(d))

merged 

Output:

  key  value_x  value_y     status
0   A      1.0      NaN      added
1   B      2.0      NaN      added
2   C      3.0      3.0  no_change
3   D      4.0      4.0  no_change
4   E      NaN      5.0    removed
5   F      NaN      6.0    removed