Conditional cumcount with reset in pandas

69 Views Asked by At

I am trying to create a cum count column that counts the instances of a boolean variable, but resets to zero when it encounters a zero in the boolean variable. I am looking for a conditional cumcount.

I have posted my code (it works) of an initial stab that I took at this issue. However, I am looking to see if anyone has something more efficient as the loop used in my solution my be slow in large dfs

def counter(series_converted_to_a_list):
    counter = 0
    counter_list = []
    # loop through the list and count
    for i in range(len(series_converted_to_a_list)):
        if series_converted_to_a_list[i] > 0:
            counter += 1
        else:
            counter = 0
        counter_list.append(counter)
    return counter_list

# Sample dataframe
df = pd.DataFrame({'bool': [1,0,1,1,1,1,1,0,1,1,1,0,0,1,1,1]})

# convert the boolean column to a list
bool_series_converted_to_a_list = list(df['bool'])

# use the function
counter_list = counter(bool_series_converted_to_a_list)

# convert the list into a column in the sample dataframe
df['counter_list'] = counter_list
df
2

There are 2 best solutions below

0
mozway On BEST ANSWER

You can use groupby.cumcount after masking the 0s and setting a custom grouper on all 0-starting groups, and finally restoring the 0s with reindex:

df = pd.DataFrame({'bool': [1,0,1,1,1,1,1,0,1,1,1,0,0,1,1,1]})

m = df['bool'].eq(1)
df['cumcount'] = (m[m].groupby((~m).cumsum()).cumcount().add(1)
                  .reindex(df.index, fill_value=0)
                 )

Or, maybe simpler, using a cumsum which will ensure that 1 always starts on the first 1 per group:

m = df['bool'].eq(1)
df['cumcount'] = m.groupby((~m).cumsum()).cumsum()

Output:

    bool  cumcount
0      1         1
1      0         0
2      1         1
3      1         2
4      1         3
5      1         4
6      1         5
7      0         0
8      1         1
9      1         2
10     1         3
11     0         0
12     0         0
13     1         1
14     1         2
15     1         3
0
Mr. Irrelevant On

I've created a df similar to your input and build the following:

import pandas as pd
Data = {'booleanvar': ['1','0','1','0','1','1','1']}
df = pd.DataFrame(Data)
df

for idx, entry in df.iterrows():
 if idx == 0:
  if entry['booleanvar'] == '0':
    df.loc[idx , 'result'] = '0'
  if entry['booleanvar'] == '1':
    df.loc[idx , 'result'] = '1'

 if idx > 0:
  if entry['booleanvar'] == '0':
   df.loc[idx , 'result'] = '0'
  if entry['booleanvar'] == '1':
   df.loc[idx , 'result'] = int(df.loc[idx-1 , 'result']) + 1


df

Output:

  booleanvar    result
0    1             1
1    0             0
2    1             1
3    0             0
4    1             1
5    1             2
6    1             3