I have a dataframe that is in descending order of time and indexed by Race_ID and Student_ID.
Race_ID Date Student_ID Mark
1 1/10/2023 1 5
1 1/10/2023 2 8
1 1/10/2023 3 7
8 1/10/2023 4 4
8 1/1/2023 1 9
8 1/1/2023 2 3
8 1/1/2023 3 5
8 1/1/2023 4 10
2 11/9/2022 1 2
2 11/9/2022 2 4
2 11/9/2022 3 9
3 17/4/2022 5 3
3 17/4/2022 2 4
3 17/4/2022 3 3
3 17/4/2022 4 7
4 1/3/2022 1 4
4 1/3/2022 2 9
5 1/1/2021 1 6
5 1/1/2021 2 1
5 1/1/2021 3 8
I want to create a new column Seasonal_Mark which is the cumsum of Mark by each student, with the caveat that the sum is set to zero on every 1 February. So the desired outcome looks like:
Race_ID Date Student_ID Mark Seasonal_Mark
1 1/10/2023 1 5 5
1 1/10/2023 2 8 8
1 1/10/2023 3 7 7
8 1/10/2023 4 4 4
8 1/1/2023 1 9 15 (4+2+9)
8 1/1/2023 2 3 20 (4+9+4+3)
8 1/1/2023 3 5 17 (9+3+5)
8 1/1/2023 4 10 17 (7+10)
2 11/9/2022 1 2 6 (4+2)
2 11/9/2022 2 4 17 (4+9+4)
2 11/9/2022 3 9 12 (9+3)
3 17/4/2022 5 3 3
3 17/4/2022 2 4 13 (4+9)
3 17/4/2022 3 3 3
3 17/4/2022 4 7 7
4 1/3/2022 1 4 4
4 1/3/2022 2 9 9
5 1/1/2021 1 6 6
5 1/1/2021 2 1 1
5 1/1/2021 3 8 8
Thank you so much in advance.
Another possible solution, which creates an adjusted year column that corresponds to the year of the date, if the date is February or later (in the year), and to the year minus 1, if the date is earlier than February (in the year). This new column is afterwards used inside
groupby.Output: