How to apply function to every row in a pandas dataframe that requires multiple rows in the calculation?

297 Views Asked by Owen Stout At 26 February 2022 at 01:06

Just an example... I have a 5M row dataframe full of runners and a log of how many miles they ran on any given day. If a runner doesn't run on a day, there is no 0 entry.

Runner	Day	Miles
A	1	3
A	3	4
A	8	4
A	9	2

I'm trying to get a field for how much they ran in the last 7 calendar days (not including the current row), as well as the next 7 calendar days (including current row).

Runner	Day	Miles	Last 7	Next 7
A	1	3	null	7
A	3	4	3	10
A	8	4	7	6
A	9	2	10	2

I think can get the answer for a single row.

Next 7 for row 1 for example...

df.loc[(df['Runner'] == 'A') & (df['day'] >= 1) & (df['day'] < 1+7)]['Miles'].sum()

I'm just unsure how to apply this type of thing to every row in an efficient way. The methods I've seen so far only utilize data from the same row. Any help?

Original Q&A

There are 1 best solutions below

Raymond Kwok On 26 February 2022 at 12:33

you may use rolling sum.

Let's say the maximum day number is 20

max_day = 20

And these are the lists of all runners and all days

import numpy as np
runners = df['Runner'].unique()
days = np.arange(1, max_day)

Then we can get a complete list of runner and day combinations

all_combos = [(r, d) for r in runners for d in days]

In the 1st line of the following 4-line code, we fill the zeros back, then in the 2nd we group the data by runner and in the 3rd we apply rolling sum on Miles requiring at least only 1 record to do a sum. You may adjust this parameter. The last line takes care of the format.

rollingsum = df.set_index(['Runner', 'Day']).reindex(all_combos).fillna(0)\
    .groupby('Runner')\
    ['Miles'].rolling(7, min_periods=1).sum()\
    .droplevel(1).rename('rs').reset_index()

We only need to do 7-day sum once, and shift the records for 'last-7-day' and 'next-7-day' by assigning two new, shifted Day columns to replace the original one, and merge the results.

df.merge(
    rollingsum.assign(Day = rollingsum['Day']+1).rename(columns={'rs': 'last_7_days'}),
    on = ['Runner', 'Day'],
    how = 'left'
).merge(
    rollingsum.assign(Day = rollingsum['Day']-6).rename(columns={'rs': 'next_7_days'}),
    on = ['Runner', 'Day'],
    how = 'left'
)

  Runner  Day  Miles  last_7_days  next_7_days
0      A    1      3          NaN          7.0
1      A    3      4          3.0         10.0
2      A    8      4          7.0          6.0
3      A    9      2          8.0          2.0

How to apply function to every row in a pandas dataframe that requires multiple rows in the calculation?

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in PANDAS

Related Questions in FUNCTION

Related Questions in MULTIROW

Trending Questions

Popular # Hahtags

Popular Questions