Incremental moving average

59 Views Asked by At

I have the following sample dataframe:

ID date data avg
1 18/3/2024 45.72991477 9.845398369
1 19/3/2024 16.66879054 12.22665416
1 20/3/2024 0.956178897 12.29306183
1 21/3/2024 0.008082319 12.27617302
1 22/3/2024 18.4076123 14.88212601
1 23/3/2024 8.3212202 12.87506446
1 24/3/2024 2.950587398 13.29176949
1 25/3/2024 79.01593876
2 18/3/2024 5.234445299 1.555186844
2 19/3/2024 13.36603398 3.441331288
2 20/3/2024 2.76175807 3.835868155
2 21/3/2024 0.007854825 3.710561654
2 22/3/2024 32.75987619 8.361661724
2 23/3/2024 0.363040191 7.823976139
2 24/3/2024 3.461841086 8.279264234
2 25/3/2024 6.28783374

The avg column is essentially a 7 days moving average. I currently use the following code to populate avg

index["avg"] = index.groupby("ID")["data"].transform(lambda x: x.rolling(7).mean())

I understand that every time code is run, it will recompute the whole avg column. Meanwhile my data is incrementally added on a daily basis. Is there a way to only compute the latest date avg using data from last 7 days including the latest added data for each ID? Note that the rolling average is for each ID. Will this approach be more efficient than my original approach of recomputing the whole avg column every time it is run?

1

There are 1 best solutions below

0
Pawan Tolani On

I think calculating rolling mean for last 7 days will definitely be faster than recomputing for the entire dataframe. Here is the code I tried with some dummy data

lst_seven_dys_df=full_df.tail(7).copy()
lst_seven_dys_df['roll']=lst_seven_dys_df['Score'].rolling(window=7).mean()
new_df=pd.concat([full_df.iloc[0:-1,:],lst_seven_dys_df.tail(1)]).reset_index(drop=True)
new_df