Taking into account weekends/holidays when using timedeltas in pandas

87 Views Asked by At

So heres a problem it took me a while to figure why its happening...

I have a dataframe of values with a datetime index of workdays.

Im adding a new column, 'ColB', which is always the same value as colA, EXCEPT where colA is 0. Then I want to use the previous days value of colA.

For example:

Day A B
2019-11-11 7 7
2019-11-12 3 3
2019-11-13 0 3
2019-11-14 5 5
2019-11-15 0 5

Note how colB on 2019-11-13 becomes the value of colA from 2019-11-12. Same on the 15th: colB takes colA from the 14th.

Ive used this list comprehension:

df.loc[:,'colB'] = [df.loc[d-pd.Timedelta(1,'d'),'ColA'] if df.loc[d,'ColA']==0 else df.loc[d,'ColA'] for d in df.index]

I thought it wasnt working (KeyError: Timestamp('2019-11-03 00:00:00')), but I discovered the problem occurs because I have a zero on a monday, and therefore no rows for d-pd.Timedelta(1,'d') which would be a sunday.

I suppose the same problem will happen when I have a zero the day after a holiday too.

To solve it, when col A is zero, I actually need to use the previous value in colA for colB, and NOT the previous DAYS value, as Ive done in the list comprehension.

Ive solved it by reindexing, using sequential numbers rather than dates, but I think its nicer to use the dates.

I tried shift:

df.loc[:,'colB'] = [df.loc[d,'ColA'].shift(-1) if df.loc[d,'ColA']==0 else df.loc[d,'ColA'] for d in df.index]

but get an error: AttributeError: 'numpy.int64' object has no attribute 'shift'

Can anyone see a simple way to do this without reindexing?

Thanks!

2

There are 2 best solutions below

2
rhug123 On BEST ANSWER

Try this:

df.assign(B = df['A'].replace(0))

Output:

          Day  A  B
0  2019-11-11  7  7
1  2019-11-12  3  3
2  2019-11-13  0  3
3  2019-11-14  5  5
4  2019-11-15  0  5
1
Christian On

You can use the shift() method to shift the index by a given number of periods. shift() slides down values (not the index), which can be used to fill the missing values in a column based on the value of the previous row.

You can use it in combination with the where() function, which replaces values where the condition is False.

df['ColB'] = df['ColA'].where(df['ColA'] != 0, df['ColA'].shift(1))

This creates a new column 'ColB' and assigns it the value of 'ColA' if 'ColA' is not zero. If 'ColA' is zero, then it takes the value from the previous row of 'ColA'.

To also handle the first row, use the fillna() function. This function replaces NA/NaN values with a specified value.

In the example below, if the first row of 'ColA' is 0, it will be replaced with the second value from 'ColA'. If you want to replace it with another value, you can replace df['ColA'][0] below with the desired value.

df['ColB'] = df['ColA'].where(df['ColA'] != 0, df['ColA'].shift(1)).fillna(df['ColA'][0])