I have a dataframe that looks like below
| Date | Name | In/Out | Time |
|---|---|---|---|
| 2024-01-01 | Homer | IN | 07:10 |
| 2024-01-01 | Homer | OUT | 09:30 |
| 2024-01-01 | Homer | IN | 10:00 |
| 2024-01-01 | Homer | OUT | 16:00 |
| 2024-01-01 | Marge | In | 07:15 |
| 2024-01-01 | Marge | Out | 16:10 |
| 2024-01-01 | Bart | In | 07:14 |
| 2024-01-01 | Bart | Out | 10:00 |
| 2024-01-01 | Bart | In | 10:15 |
| 2024-01-01 | Bart | Out | 12:00 |
| 2024-01-01 | Bart | In | 12:30 |
| 2024-01-01 | Bart | Out | 17:00 |
My end goal is to calculate the total timedelta (total_seconds()) of intermediate IN/OUT times only, as highlighted. Expected output below.
| Date | Name | TimeDelta |
|---|---|---|
| 2024-01-01 | Homer | 1800 |
| 2024-01-01 | Bart | 2700 |
I haven't been to sure where to start with this one and cannot find any examples anywhere else?
example dataframe:
example_df = pd.DataFrame([
['2024-01-01', 'Homer', 'in', '07:30'],
['2024-01-01', 'Homer', 'out' ,'09:00'],
['2024-01-01', 'Homer', 'in' ,'09:30'],
['2024-01-01', 'Homer', 'out' ,'16:00'],
['2024-01-01', 'Marge', 'in' , '06:20'],
['2024-01-01', 'Marge', 'out' ,'16:00'],
['2024-01-01', 'Bart', 'in' ,'07:10'],
['2024-01-01', 'Bart', 'out' ,'08:00'],
['2024-01-01', 'Bart', 'in' ,'08:20'],
['2024-01-01', 'Bart', 'out' ,'17:00'],
['2024-01-01', 'Barney', 'in' ,'08:10'],
['2024-01-01', 'Lisa', 'in' ,'08:05'],
['2024-01-01', 'Lisa', 'out' ,'14:00'],
['2024-01-01', 'Lisa', 'in' ,'14:15'],
['2024-01-01', 'Lisa', 'out' ,'18:10'],
['2024-01-01', 'Millhouse', 'out' ,'19:10'],
['2024-02-01', 'Homer', 'in', '07:30'],
['2024-02-01', 'Homer', 'out' ,'09:00'],
['2024-02-01', 'Marge', 'in' , '06:30'],
['2024-02-01', 'Marge', 'out' ,'09:10'],
['2024-02-01', 'Marge', 'in' ,'10:10'],
['2024-02-01', 'Marge', 'out' ,'16:10'],
['2024-02-01', 'Bart', 'in' ,'07:10'],
['2024-02-01', 'Bart', 'out' ,'15:00'],
['2024-02-01', 'Barney', 'in' ,'08:10'],
['2024-02-01', 'Lisa', 'in' ,'08:05'],
['2024-02-01', 'Lisa', 'out' ,'16:00'],
['2024-02-01', 'Millhouse', 'in' ,'08:10'],
['2024-02-01', 'Millhouse', 'in' ,'08:10'],
['2024-02-01', 'Millhouse', 'in' ,'16:15']],
columns=['Date', 'Name', 'In/Out', 'Time'])

Assuming Time is sorted within a group, that the first In/Out is always In, and that In/Out are always alternating.
You could convert the times
to_datetime, then usegroupby.applyto compute thediff, ignore the first/last value (withiloc) andsumthe "IN" timedeltas before converting tototal_seconds:Output:
NB. If any of the initial assumptions is incorrect, you just need to pre-process the data to sort it and remove invalid rows.
Example:
Output: