The group_id should be increasing for each binary that is = 1 but grouping all consecutive 0's into one group_id, with the next group of consecutive 0's being in another group_id.

I have been using

df['group_id1'] = df['diff'].cumsum()

and getting the group_id1 column as a result. The result I am looking for is group_id2.

    diff    group_id1   group_id2
0   1   1   1
1   0   1   2
2   0   1   2
3   0   1   2
4   0   1   2
5   0   1   2
6   0   1   2
7   0   1   2
8   0   1   2
9   1   2   3
10  1   3   4
11  1   4   5
12  1   5   6
13  1   6   7
14  1   7   8
15  1   8   9
16  1   9   10
17  1   10  11
18  0   10  12
19  0   10  12
1

There are 1 best solutions below

0
mozway On BEST ANSWER

Assuming you only have 0/1, you can use diff to identify the 0s preceded by 1s:

df['group_id2'] = (df['diff'].eq(1)|df['diff'].diff().eq(-1)).cumsum()

If you have other values go with shift:

df['group_id2'] = (df['diff'].eq(1)
                  |(df['diff'].eq(0)&df['diff'].shift().eq(1))
                  ).cumsum()

Another variant:

s = df['diff'].eq(1)
df['group_id2'] = (s|s.shift()).cumsum()

Output:

    diff  group_id2
0      1          1
1      0          2
2      0          2
3      0          2
4      0          2
5      0          2
6      0          2
7      0          2
8      0          2
9      1          3
10     1          4
11     1          5
12     1          6
13     1          7
14     1          8
15     1          9
16     1         10
17     1         11
18     0         12
19     0         12