I am trying to figure out a non-looping way to identify (auto-incrementing int would be ideal) the non-unique groups of rows (a group can contain 1 or more rows) within each TDateID, GroupID combination.
Here is an example DataFrame that looks like
| Index | Cents | SD_YF | TDateID | GroupID |
|---|---|---|---|---|
| 10 | 182.5 | 2.1 | 0 | 0 |
| 11 | 182.5 | 2.1 | 0 | 0 |
| 12 | 153.5 | 1.05 | 0 | 1 |
| 13 | 153.5 | 1.05 | 0 | 1 |
| 14 | 43 | 11 | 1 | 2 |
| 15 | 43 | 11 | 1 | 2 |
| 4 | 152 | 21 | 1 | 2 |
| 5 | 152 | 21 | 1 | 2 |
My ideal output would be:
| Index | Cents | SD_YF | TDateID | GroupID | UniID |
|---|---|---|---|---|---|
| 10 | 182.5 | 2.1 | 0 | 0 | 1 |
| 11 | 182.5 | 2.1 | 0 | 0 | 2 |
| 12 | 153.5 | 1.05 | 0 | 1 | 3 |
| 13 | 153.5 | 1.05 | 0 | 1 | 4 |
| 14 | 43 | 11 | 1 | 2 | 5 |
| 15 | 43 | 11 | 1 | 2 | 6 |
| 4 | 152 | 21 | 1 | 2 | 5 |
| 5 | 152 | 21 | 1 | 2 | 6 |
I have bolded #5 to draw attention to how index 14, 4 are paired together. Similar with #6. I hope that makes sense!
IIUC you need to add the group number + the cumcount per duplicate + 1:
output: