Counting the number of times 3 different strings appear over numerous columns and putting this count in a new column

57 Views Asked by wick At 03 August 2023 at 17:06

I'm trying to create a new column that tallys up the number of times someone was paid for a job - regardless of if it was all of the money or just some of the money. So for each row, if it says "yes" or "partial" or "paid" in the job columns then I want a count of this in the new column.

My actual data has 15 different job columns that I want to "sum" across.

So before looks like:

Name	Job1	Job2
tom	Yes	No
nick	Partial	Yes
juli	No	No

And I'd like afterwards to look like:

Name	Job1	Job2	Received_money
tom	Yes	No	1
nick	Partial	Yes	2
juli	No	No	0

Current code

df['Received_money'] = df[['Job1', 'Job2']].apply(lambda row: len(row[row == 'Yes']), axis=1) This is my current code and it partially does what I want. It adds up the number of times it says "Yes" in the columns listed. But:

I can't figure out how to expand this to include "== 'partial'" and "== 'paid'", and how to get it to give 1 point (so to speak) for each time these occur
Is there any other way of entering in all 15 of my column names instead of [['Job1', 'Job2', 'Job3', 'Job4', 'Job5'....'Job15' ]]

(Example data)

import pandas as pd
  
# initialize list of lists
data = [['tom', "Yes", "No"], ['nick', "Partial", "Yes"], ['juli', "No", "No"]]
  
# Create the pandas DataFrame
df = pd.DataFrame(data, columns=['Name', 'Job1', 'Job2'])

Thank you!

Original Q&A

There are 3 best solutions below

code-sk On 03 August 2023 at 17:23 BEST ANSWER

I added 2 more Job columns just for testing. This should satisfy your requirement.

data = [['tom', "Yes", "No","Partial","Paid"], ['nick', "Partial", "Yes"], ['juli', "No", "No","Partial","Paid"]]

df = pd.DataFrame(data, columns=['Name', 'Job1', 'Job2','Job3','Job4'])

job_cols =['Job1','Job2','Job3','Job4']

paid_values = ['Yes','Paid','Partial']

df['Received_money'] = df[job_cols].apply(lambda row : len([r for r in row if r in paid_values]),axis=1)

print(df)

Suraj Shourie On 03 August 2023 at 17:12

Your approach seems okay. Just add the other filters like this:

df['Received_money'] = df[['Job1', 'Job2']].apply(lambda row: len(row[row == 'Yes']) +len(row[row == 'Partial']), axis=1)

mozway On 03 August 2023 at 18:18

Don't use apply, you can easily vectorize this:

df['Received_money'] = df.filter(like='Job').isin(['Yes', 'Partial']).sum(axis=1)

Or, if the Job columns don't start with a literal "Job":

cols = ['Job1', 'Job2']
df['Received_money'] = df[cols].isin(['Yes', 'Partial']).sum(axis=1)

Output:

   Name     Job1 Job2  Received_money
0   tom      Yes   No               1
1  nick  Partial  Yes               2
2  juli       No   No               0

Counting the number of times 3 different strings appear over numerous columns and putting this count in a new column

There are 3 best solutions below

Related Questions in PYTHON

Related Questions in PANDAS

Related Questions in DATA-CLEANING

Related Questions in FEATURE-ENGINEERING

Trending Questions

Popular # Hahtags

Popular Questions