How can I create a day-number for a 4-year cycle in a pandas dataframe?

43 Views Asked by At

I have timeseries data with datetimeindex, price and the cycle phase. My dataframe looks like this:

Dateindex Price CyclePhase Day
1928-01-03 00:00:00 71.04 0 1
1928-01-04 00:00:00 70.88 0 2
1928-01-05 00:00:00 70.2 0 3
1928-01-06 00:00:00 70.64 0 4
1929-05-09 00:00:00 104.08 1 400
1929-05-10 00:00:00 105.36 1 401
1929-05-11 00:00:00 104.96 1 402
1929-05-13 00:00:00 102.56 1 403
1930-11-08 00:00:00 63.56 2 844
1930-11-10 00:00:00 62.16 2 845
1930-11-11 00:00:00 63.16 2 846
1931-12-29 00:00:00 31.84 3 1185
1931-12-30 00:00:00 32.4 3 1186
1931-12-31 00:00:00 32.48 3 1187
1932-01-02 00:00:00 31.28 0 1
1932-01-04 00:00:00 30.24 0 2
1932-01-05 00:00:00 30.2 0 3

I would like to create the column "Day" where it is counting up for each row in the dataframe until the cycle restarts. (when CyclePhase changes from 3 to 0 again).

How can I do that best in python?

2

There are 2 best solutions below

2
Shubham Sharma On BEST ANSWER

Create grouper to flag the rows where cycle changes the calculate cumcount to assign row numbers per group

s = df['CyclePhase'].diff().lt(0).cumsum()
df['Day'] = s.groupby(s).cumcount() + 1

              Dateindex   Price  CyclePhase  Day
0   1928-01-03 00:00:00   71.04           0    1
1   1928-01-04 00:00:00   70.88           0    2
2   1928-01-05 00:00:00   70.20           0    3
3   1928-01-06 00:00:00   70.64           0    4
4   1929-05-09 00:00:00  104.08           1    5
5   1929-05-10 00:00:00  105.36           1    6
6   1929-05-11 00:00:00  104.96           1    7
7   1929-05-13 00:00:00  102.56           1    8
8   1930-11-08 00:00:00   63.56           2    9
9   1930-11-10 00:00:00   62.16           2   10
10  1930-11-11 00:00:00   63.16           2   11
11  1931-12-29 00:00:00   31.84           3   12
12  1931-12-30 00:00:00   32.40           3   13
13  1931-12-31 00:00:00   32.48           3   14
14  1932-01-02 00:00:00   31.28           0    1
15  1932-01-04 00:00:00   30.24           0    2
16  1932-01-05 00:00:00   30.20           0    3
2
Quang Hoang On

You can groupby on the year like this:

df['Day'] = df.groupby(df['Dateindex'].dt.year // 4).cumcount().add(1)

If your data starts from some year that is not divisible by 4, then you can do:

years = df['Dateindex'].dt.year
df['Day'] = df.groupby(years.sub(years.min()) // 4).cumcount().add(1)

Note: if your Dateindex is indeed the index, then

years = df.index.year