How can I create a day-number for a 4-year cycle in a pandas dataframe?

43 Views Asked by Markus W At 11 September 2023 at 15:04

I have timeseries data with datetimeindex, price and the cycle phase. My dataframe looks like this:

Dateindex	Price	CyclePhase	Day
1928-01-03 00:00:00	71.04	0	1
1928-01-04 00:00:00	70.88	0	2
1928-01-05 00:00:00	70.2	0	3
1928-01-06 00:00:00	70.64	0	4
…	…	…	…
1929-05-09 00:00:00	104.08	1	400
1929-05-10 00:00:00	105.36	1	401
1929-05-11 00:00:00	104.96	1	402
1929-05-13 00:00:00	102.56	1	403
…	…	…	…
1930-11-08 00:00:00	63.56	2	844
1930-11-10 00:00:00	62.16	2	845
1930-11-11 00:00:00	63.16	2	846
…	…	…	…
1931-12-29 00:00:00	31.84	3	1185
1931-12-30 00:00:00	32.4	3	1186
1931-12-31 00:00:00	32.48	3	1187
1932-01-02 00:00:00	31.28	0	1
1932-01-04 00:00:00	30.24	0	2
1932-01-05 00:00:00	30.2	0	3
…	…	…	…

I would like to create the column "Day" where it is counting up for each row in the dataframe until the cycle restarts. (when CyclePhase changes from 3 to 0 again).

How can I do that best in python?

Original Q&A

There are 2 best solutions below

Shubham Sharma On 11 September 2023 at 15:14 BEST ANSWER

Create grouper to flag the rows where cycle changes the calculate cumcount to assign row numbers per group

s = df['CyclePhase'].diff().lt(0).cumsum()
df['Day'] = s.groupby(s).cumcount() + 1

              Dateindex   Price  CyclePhase  Day
0   1928-01-03 00:00:00   71.04           0    1
1   1928-01-04 00:00:00   70.88           0    2
2   1928-01-05 00:00:00   70.20           0    3
3   1928-01-06 00:00:00   70.64           0    4
4   1929-05-09 00:00:00  104.08           1    5
5   1929-05-10 00:00:00  105.36           1    6
6   1929-05-11 00:00:00  104.96           1    7
7   1929-05-13 00:00:00  102.56           1    8
8   1930-11-08 00:00:00   63.56           2    9
9   1930-11-10 00:00:00   62.16           2   10
10  1930-11-11 00:00:00   63.16           2   11
11  1931-12-29 00:00:00   31.84           3   12
12  1931-12-30 00:00:00   32.40           3   13
13  1931-12-31 00:00:00   32.48           3   14
14  1932-01-02 00:00:00   31.28           0    1
15  1932-01-04 00:00:00   30.24           0    2
16  1932-01-05 00:00:00   30.20           0    3

Quang Hoang On 11 September 2023 at 15:23

You can groupby on the year like this:

df['Day'] = df.groupby(df['Dateindex'].dt.year // 4).cumcount().add(1)

If your data starts from some year that is not divisible by 4, then you can do:

years = df['Dateindex'].dt.year
df['Day'] = df.groupby(years.sub(years.min()) // 4).cumcount().add(1)

Note: if your Dateindex is indeed the index, then

years = df.index.year

How can I create a day-number for a 4-year cycle in a pandas dataframe?

There are 2 best solutions below

Related Questions in PYTHON-3.X

Related Questions in PANDAS

Related Questions in DATAFRAME

Related Questions in TIME-SERIES

Related Questions in CYCLE

Trending Questions

Popular # Hahtags

Popular Questions