I have a list of events. Every case (case ID) will eventually go through the events( event A, Event B). I have columns containing time stamps for the respective events. I am trying to find the days between Event A and Event B based on the below using Python. However, some cases go through event a more than one time, and I need the time lapse to be based on the latest date. In the example below, latest date for Case ID 1 would be 1/20/23, ignoring the previous two occurrences of EVENT A for this case ID.
| CaseID | EVENT | DATE |
|---|---|---|
| 1 | A | 1/1/23 |
| 1 | A | 1/3/23 |
| 1 | A | 1/20/23 |
| 1 | B | 2/1/23 |
| 8 | A | 1/2/23 |
| 100 | A | 3/1/23 |
| 100 | B | 3/2/23 |
| 35 | A | 2/13/23 |
| 35 | B | 2/27/23 |
| 6 | A | 2/14/23 |
| 33 | A | 2/26/23 |
| 2 | A | 3/4/23 |
| 2 | B | 4/30/23 |
I have tried sorting by duplicates, but I’m just not as comfortable using Python as SQL, and I’m unable to use SQL. Also trying to have to avoid dropping duplicates manually. The output I am expecting would look like the below: |caseID|busdaysbweventAB|
You can use:
Output: