I'm trying to create a new column that tallys up the number of times someone was paid for a job - regardless of if it was all of the money or just some of the money. So for each row, if it says "yes" or "partial" or "paid" in the job columns then I want a count of this in the new column.

My actual data has 15 different job columns that I want to "sum" across.

So before looks like:

Name Job1 Job2
tom Yes No
nick Partial Yes
juli No No

And I'd like afterwards to look like:

Name Job1 Job2 Received_money
tom Yes No 1
nick Partial Yes 2
juli No No 0

Current code

df['Received_money'] = df[['Job1', 'Job2']].apply(lambda row: len(row[row == 'Yes']), axis=1) This is my current code and it partially does what I want. It adds up the number of times it says "Yes" in the columns listed. But:

  1. I can't figure out how to expand this to include "== 'partial'" and "== 'paid'", and how to get it to give 1 point (so to speak) for each time these occur
  2. Is there any other way of entering in all 15 of my column names instead of [['Job1', 'Job2', 'Job3', 'Job4', 'Job5'....'Job15' ]]

(Example data)

import pandas as pd
  
# initialize list of lists
data = [['tom', "Yes", "No"], ['nick', "Partial", "Yes"], ['juli', "No", "No"]]
  
# Create the pandas DataFrame
df = pd.DataFrame(data, columns=['Name', 'Job1', 'Job2'])

Thank you!

3

There are 3 best solutions below

1
code-sk On BEST ANSWER

I added 2 more Job columns just for testing. This should satisfy your requirement.

data = [['tom', "Yes", "No","Partial","Paid"], ['nick', "Partial", "Yes"], ['juli', "No", "No","Partial","Paid"]]

df = pd.DataFrame(data, columns=['Name', 'Job1', 'Job2','Job3','Job4'])

job_cols =['Job1','Job2','Job3','Job4']

paid_values = ['Yes','Paid','Partial']

df['Received_money'] = df[job_cols].apply(lambda row : len([r for r in row if r in paid_values]),axis=1)

print(df)
0
Suraj Shourie On

Your approach seems okay. Just add the other filters like this:

df['Received_money'] = df[['Job1', 'Job2']].apply(lambda row: len(row[row == 'Yes']) +len(row[row == 'Partial']), axis=1)
0
mozway On

Don't use apply, you can easily vectorize this:

df['Received_money'] = df.filter(like='Job').isin(['Yes', 'Partial']).sum(axis=1)

Or, if the Job columns don't start with a literal "Job":

cols = ['Job1', 'Job2']
df['Received_money'] = df[cols].isin(['Yes', 'Partial']).sum(axis=1)

Output:

   Name     Job1 Job2  Received_money
0   tom      Yes   No               1
1  nick  Partial  Yes               2
2  juli       No   No               0