Cumsum in Pandas Dataframe but reset to zero on a specific date every new year

Question

Cumsum in Pandas Dataframe but reset to zero on a specific date every new year

68 Views Asked by Nayr borcherds At 12 December 2023 at 10:31

I have a dataframe that is in descending order of time and indexed by Race_ID and Student_ID.

Race_ID   Date           Student_ID      Mark     
1         1/10/2023      1               5        
1         1/10/2023      2               8        
1         1/10/2023      3               7        
8         1/10/2023      4               4        
8         1/1/2023       1               9        
8         1/1/2023       2               3        
8         1/1/2023       3               5        
8         1/1/2023       4               10       
2         11/9/2022      1               2        
2         11/9/2022      2               4        
2         11/9/2022      3               9        
3         17/4/2022      5               3        
3         17/4/2022      2               4        
3         17/4/2022      3               3        
3         17/4/2022      4               7        
4         1/3/2022       1               4        
4         1/3/2022       2               9        
5         1/1/2021       1               6        
5         1/1/2021       2               1        
5         1/1/2021       3               8

I want to create a new column Seasonal_Mark which is the cumsum of Mark by each student, with the caveat that the sum is set to zero on every 1 February. So the desired outcome looks like:

Race_ID   Date           Student_ID      Mark     Seasonal_Mark 
1         1/10/2023      1               5        5
1         1/10/2023      2               8        8
1         1/10/2023      3               7        7
8         1/10/2023      4               4        4
8         1/1/2023       1               9        15 (4+2+9)
8         1/1/2023       2               3        20 (4+9+4+3)
8         1/1/2023       3               5        17 (9+3+5)
8         1/1/2023       4               10       17 (7+10)                  
2         11/9/2022      1               2        6  (4+2)
2         11/9/2022      2               4        17 (4+9+4)
2         11/9/2022      3               9        12 (9+3)
3         17/4/2022      5               3        3
3         17/4/2022      2               4        13 (4+9)
3         17/4/2022      3               3        3
3         17/4/2022      4               7        7
4         1/3/2022       1               4        4
4         1/3/2022       2               9        9
5         1/1/2021       1               6        6
5         1/1/2021       2               1        1
5         1/1/2021       3               8        8

Thank you so much in advance.

Original Q&A

There are 2 best solutions below

mozway On 12 December 2023 at 10:54

You can use a custom yearly Period starting on the 1st of February as a secondary grouper, for this use to_period with A-JAN as freq:

df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)

df['Seasonal_Mark'] = (df.sort_values(by='Date')
                         .groupby(['Student_ID', df['Date'].dt.to_period('A-JAN')])
                       ['Mark'].cumsum()
                      )

Output:

    Race_ID       Date  Student_ID  Mark  Seasonal_Mark
0         1 2023-10-01           1     5              5
1         1 2023-10-01           2     8              8
2         1 2023-10-01           3     7              7
3         8 2023-10-01           4     4              4
4         8 2023-01-01           1     9             15
5         8 2023-01-01           2     3             20
6         8 2023-01-01           3     5             17
7         8 2023-01-01           4    10             17
8         2 2022-09-11           1     2              6
9         2 2022-09-11           2     4             17
10        2 2022-09-11           3     9             12
11        3 2022-04-17           5     3              3
12        3 2022-04-17           2     4             13
13        3 2022-04-17           3     3              3
14        3 2022-04-17           4     7              7
15        4 2022-03-01           1     4              4
16        4 2022-03-01           2     9              9
17        5 2021-01-01           1     6              6
18        5 2021-01-01           2     1              1
19        5 2021-01-01           3     8              8

**PaulS** · Accepted Answer · 2023-12-12T13:39:10.033000

Another possible solution, which creates an adjusted year column that corresponds to the year of the date, if the date is February or later (in the year), and to the year minus 1, if the date is earlier than February (in the year). This new column is afterwards used inside groupby.

df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)

m = df['Date'].dt.month
y = df['Date'].dt.year

df.assign(
    Seasonal_Mark = (df.assign(
        adjusted_year = np.where(m >= 2, y, y - 1))
    .sort_values(by='Date')
    .groupby(['Student_ID', 'adjusted_year'])['Mark'].cumsum()))

Output:

    Race_ID       Date  Student_ID  Mark  Seasonal_Mark
0         1 2023-10-01           1     5              5
1         1 2023-10-01           2     8              8
2         1 2023-10-01           3     7              7
3         8 2023-10-01           4     4              4
4         8 2023-01-01           1     9             15
5         8 2023-01-01           2     3             20
6         8 2023-01-01           3     5             17
7         8 2023-01-01           4    10             17
8         2 2022-09-11           1     2              6
9         2 2022-09-11           2     4             17
10        2 2022-09-11           3     9             12
11        3 2022-04-17           5     3              3
12        3 2022-04-17           2     4             13
13        3 2022-04-17           3     3              3
14        3 2022-04-17           4     7              7
15        4 2022-03-01           1     4              4
16        4 2022-03-01           2     9              9
17        5 2021-01-01           1     6              6
18        5 2021-01-01           2     1              1
19        5 2021-01-01           3     8              8

Cumsum in Pandas Dataframe but reset to zero on a specific date every new year

There are 2 best solutions below

Related Questions in PYTHON-3.X

Related Questions in PANDAS

Related Questions in DATAFRAME

Related Questions in GROUP-BY

Related Questions in CUMSUM

Trending Questions

Popular # Hahtags

Popular Questions