Can I use np.where to on-the-run update values based on previous rows?

99 Views Asked by MoS At 21 February 2024 at 13:04

Is there any way I can implement an on-the run function like that on a dataframe?

I have an imaginary dataframe 'Classroom" as follows with - an imaginary implementation to illustrate the issue -:

Student.	Subject.	Mark.	New Sub.
Mike	English	pass	English
Mike	French	pass	None
Mike	History	pass	None
Mike	Bio	Fail	None
...	......	......	None

**I want to implement values in 'New Sub." Column such that for every student, it checks all the subject's grades, if the subject is "pass" then it gets appended to the current list of "New Sub." if it's not added, if it is "Fail" then the subject should'nt be added, to have the current view: **

Student.	Subject.	Mark.	New Sub.
Mike	English	pass	English
Mike	French	pass	English,French
Mike	History	pass	English,French,History
Mike	Bio	Fail	English,French,History
...	......	......	None

I tried implementing that using (np.where) like that:

Classroom["New Sub."]= np.where((Classroom["Mark."]=="pass"),Classroom["New Sub."].shift(1)+","+["Subject"],Classroom["New Sub."])

The issue is that "New Sub." Column doesn't get updated on the run while running np.where, so what I get is the following:

Student.	Subject.	Mark.	New Sub.
Mike	English	pass	English
Mike	French	pass	English,French
Mike	History	pass	None,History
Mike	Bio	Fail	None
...	......	......	None

As if it gets the old values of "New Sub." and not getting the previous values after modifications.

Is there any way I can implement an on-the run function like that on a dataframe?

Original Q&A

There are 2 best solutions below

Namith Nimlaka On 21 February 2024 at 13:23

The issue with your approach using 'np.where' is that it's not designed to perform operations like concatenating strings and accessing previous values within a pandas DataFrame column directly. You're trying to use shift(1) and accessing the Subject column inside 'np.where', which won't work as expected.

you can use a combination of groupby, cumsum, and apply functions in pandas.

import pandas as pd
# Example DataFrame
data = {
'Student': ['Mike', 'Mike', 'Mike', 'Mike'],
'Subject': ['English', 'French', 'History', 'Bio'],
'Mark': ['pass', 'pass', 'pass', 'fail']}
df = pd.DataFrame(data)

def update_new_sub(group):
pass_subjects = group[group['Mark'] == 'pass']['Subject']
group['New Sub.'] = ','.join(pass_subjects)
return group

df['New Sub.'] = df.groupby('Student').apply(update_new_sub)['New Sub.']
print(df)

e-motta On 21 February 2024 at 14:07

np.where and df.shift are vectorized operations, so they will not process each row sequentially.

You could in theory do what you want with an iterative approach like df.iterrows, but that's hardly ever efficient when dealing with dataframes.

What you can do instead is use a vectorized approach to compute all the values at once, then merge them into the original dataframe:

df = pd.DataFrame(
    {
        "Student": ["Mike", "Mike", "Mike", "Mike", "Bob"],
        "Subject": ["English", "French", "History", "Bio", "English"],
        "Mark": ["pass", "pass", "pass", "Fail", "pass"],
    }
)

new_subs = (
    df.loc[df["Mark"] == "pass", :]
    .groupby("Student")
    .apply(lambda row: ",".join(row["Subject"]))
    .reset_index(name="New Sub.")
)

df = df.merge(new_subs, on="Student", how="left")

Resulting in:

  Student  Subject  Mark                New Sub.
0    Mike  English  pass  English,French,History
1    Mike   French  pass  English,French,History
2    Mike  History  pass  English,French,History
3    Mike      Bio  Fail  English,French,History

Can I use np.where to on-the-run update values based on previous rows?

There are 2 best solutions below

Related Questions in PYTHON

Related Questions in PANDAS

Related Questions in DATAFRAME

Related Questions in NUMPY

Related Questions in APPLY

Trending Questions

Popular # Hahtags

Popular Questions