I've a data set that represents rainfall every hour in a day. I'm creating column 'E1' which should start from zero and increment every time column 'value' is greater than zero, and stops when column 'value' becomes zero again, again when column 'value' is zero the numbering must continue.
condition = ((df['value'] > 0) & (df['value'].shift(periods=1) == 0))
df['E2'] = (condition).cumsum()
print(df)
hour value E2
0 0 0.0 0
1 1 0.2 1
2 2 0.2 1
3 3 0.2 1
4 4 0.0 1
5 5 0.2 2
6 6 0.2 2
7 7 0.0 2
8 8 NaN 2
9 9 0.2 2
10 10 0.0 2
11 11 0.0 2
12 12 0.2 3
13 13 0.2 3
14 14 0.0 3
15 15 NaN 3
16 16 0.2 3
17 17 0.0 3
18 18 0.2 4
19 19 0.0 4
20 20 0.2 5
21 21 0.2 5
22 22 NaN 5
23 23 0.0 5
E1 represents the event number, an event can last 1 or several hours, an event should only be considered when the cell before the start of the event is zero and the cell after the last data is equal to zero
I'm stuck, trying to list the events. Should get:
hour value E2
0 0 0.0 0
1 1 0.2 1
2 2 0.2 1
3 3 0.2 1
4 4 0.0 0
5 5 0.2 2
6 6 0.2 2
7 7 0.0 0
8 8 NaN 0
9 9 0.2 0
10 10 0.0 0
11 11 0.0 0
12 12 0.2 3
13 13 0.2 3
14 14 0.0 0
15 15 NaN 0
16 16 0.2 0
17 17 0.0 0
18 18 0.2 4
19 19 0.0 0
20 20 0.2 0
21 21 0.2 0
22 22 NaN 0
23 23 0.0 0
I find this an odd criteria, but here's how to compute your "event" numbers. Because you're looking both forward and backward, there's no way to do this in a vectorized way.
Output: