Count Rows based on 2 fields grouped

23 Views Asked by At

I am trying to recreate the MultiRow tool used in Alteryx. I would like to group the data by two columns (DATE, CALL_ID) and then do a running count per row for each group. I'm using the groupby but I don't think that's right because I don't want the output to be grouped and I don't want the data to be aggregated, I want each row to still exist.

Example Data:

DATE        CALL_ID   
2023-11-21  29933702  
2023-11-21  29933703    
2023-11-21  29933703    
2023-11-21  29933704    
2023-11-21  29933704
2023-11-22  29933704  

I want the output to be:

DATE        CALL_ID    COUNT
2023-11-21  29933702    1
2023-11-21  29933703    1
2023-11-21  29933703    2
2023-11-21  29933704    1
2023-11-21  29933704    2
2023-11-22  29933704    1

OUTPUT

#My code: 
g = df.groupby(['DATE','CALL_ID']).size()

DATE        CALL_ID 
2023-11-21  29933702    1
            29933703    2
            29933704    2
2023-11-22  29933704    1

1

There are 1 best solutions below

0
Lukas Hestermeyer On

You are looking for cumcount.

import pandas as pd

df = pd.DataFrame({'DATE':['2023-11-21']*5, 'CALL_ID': [29933702, 29933703, 29933703, 29933704, 29933704]})

df['COUNT'] = df.groupby(['DATE', 'CALL_ID']).cumcount() + 1

print(df)

Output:

         DATE   CALL_ID  COUNT
0  2023-11-21  29933702      1
1  2023-11-21  29933703      1
2  2023-11-21  29933703      2
3  2023-11-21  29933704      1
4  2023-11-21  29933704      2