pandas: assign a column values by slice with method chaining

Question

pandas: assign a column values by slice with method chaining

77 Views Asked by gregV At 21 August 2023 at 22:52

In the following toy example, i'm trying to add a status column based on the outer merge results. The challenge is to preserve the chaining method as best described in tom's blog. The commented out line is my attempt at it but it does not work

import pandas as pd

# Create sample data frames A and B
A = pd.DataFrame({
    'key': ['A', 'B', 'C', 'D'],
    'value': [1, 2, 3, 4]
})

B = pd.DataFrame({
    'key': ['C', 'D', 'E', 'F'],
    'value': [3, 4, 5, 6]
})

# Merge data frames A and B on the 'key' column and add an indicator column
merged = pd.merge(A, B, on='key', how='outer', indicator=True)

# add a status column
#{'both':'no change',
 #'left_only': 'added',
 #'right_only': 'removed'}

merged = (merged
          .assign (status = 'no change')
          #.assign(status = lambda x: x.loc[x._merge == 'left_only'], 'added')
          .drop('_merge', axis=1)
          )

Original Q&A

There are 3 best solutions below

taller On 22 August 2023 at 00:23

Adding DataFrame.apply to get the status.

merged = (merged
          .assign(status = merged.apply(lambda x: 
                  'added' if x._merge == "left_only" else "", axis=1))
          .drop('_merge', axis=1)
          )

  key  value_x  value_y status                                                                
0   A      1.0      NaN  added                                                                
1   B      2.0      NaN  added                                                                
2   C      3.0      3.0                                                                       
3   D      4.0      4.0                                                                       
4   E      NaN      5.0                                                                       
5   F      NaN      6.0

Scott Boston On 22 August 2023 at 00:27

Here's a way to do it all in one-line using the 'walrus' operator, :=, map with a pre-defined dictionary, and changing the indicator column name in merge to a string:

import pandas as pd

# Create sample data frames A and B
A = pd.DataFrame({
    'key': ['A', 'B', 'C', 'D'],
    'value': [1, 2, 3, 4]
})

B = pd.DataFrame({
    'key': ['C', 'D', 'E', 'F'],
    'value': [3, 4, 5, 6]
})

d = {'both':'no_change',
     'left_only':'added',
     'right_only':'removed'}

merged = (merged_out:=pd.merge(A, B, on='key', how='outer', indicator='status'))\
            .assign(status=merged_out['status'].map(d))

merged

Output:

  key  value_x  value_y     status
0   A      1.0      NaN      added
1   B      2.0      NaN      added
2   C      3.0      3.0  no_change
3   D      4.0      4.0  no_change
4   E      NaN      5.0    removed
5   F      NaN      6.0    removed

**sammywemmy** · Accepted Answer · 2023-08-21T23:53:37.330000

something like this should suffice - generally for the slice since you are assigning, you need to use a conditional (map, np.where, np.select, pd.where etc)

(A
.merge(B, on='key', how='outer', indicator=True)
.assign(status = lambda f: f._merge.map({"left_only":"added", 
                                          "both":"no change", 
                                          "right_only":"removed"}))
)

pandas: assign a column values by slice with method chaining

There are 3 best solutions below

Related Questions in PANDAS

Related Questions in METHOD-CHAINING

Trending Questions

Popular # Hahtags

Popular Questions