I am trying to create a cum count column that counts the instances of a boolean variable, but resets to zero when it encounters a zero in the boolean variable. I am looking for a conditional cumcount.
I have posted my code (it works) of an initial stab that I took at this issue. However, I am looking to see if anyone has something more efficient as the loop used in my solution my be slow in large dfs
def counter(series_converted_to_a_list):
counter = 0
counter_list = []
# loop through the list and count
for i in range(len(series_converted_to_a_list)):
if series_converted_to_a_list[i] > 0:
counter += 1
else:
counter = 0
counter_list.append(counter)
return counter_list
# Sample dataframe
df = pd.DataFrame({'bool': [1,0,1,1,1,1,1,0,1,1,1,0,0,1,1,1]})
# convert the boolean column to a list
bool_series_converted_to_a_list = list(df['bool'])
# use the function
counter_list = counter(bool_series_converted_to_a_list)
# convert the list into a column in the sample dataframe
df['counter_list'] = counter_list
df
You can use
groupby.cumcountafter masking the 0s and setting a custom grouper on all 0-starting groups, and finally restoring the 0s withreindex:Or, maybe simpler, using a
cumsumwhich will ensure that 1 always starts on the first 1 per group:Output: