I am trying to recreate the MultiRow tool used in Alteryx. I would like to group the data by two columns (DATE, CALL_ID) and then do a running count per row for each group. I'm using the groupby but I don't think that's right because I don't want the output to be grouped and I don't want the data to be aggregated, I want each row to still exist.
Example Data:
DATE CALL_ID
2023-11-21 29933702
2023-11-21 29933703
2023-11-21 29933703
2023-11-21 29933704
2023-11-21 29933704
2023-11-22 29933704
I want the output to be:
DATE CALL_ID COUNT
2023-11-21 29933702 1
2023-11-21 29933703 1
2023-11-21 29933703 2
2023-11-21 29933704 1
2023-11-21 29933704 2
2023-11-22 29933704 1
OUTPUT
#My code:
g = df.groupby(['DATE','CALL_ID']).size()
DATE CALL_ID
2023-11-21 29933702 1
29933703 2
29933704 2
2023-11-22 29933704 1
You are looking for
cumcount.Output: