Pandas calculate elapsed time

67 Views Asked by At
data = [    
            {'DateTime': pd.to_datetime('2023-09-26 09:36:47'), 'Action': 'OUT', 'LicenseCode': 'APP01', 'User': 'USER01'},
            {'DateTime': pd.to_datetime('2023-09-26 11:01:55'), 'Action': 'IN', 'LicenseCode': 'APP01', 'User': 'USER01'},
            {'DateTime': pd.to_datetime('2023-09-27 10:01:20'), 'Action': 'OUT', 'LicenseCode': 'APP02', 'User': 'USER02'},
            {'DateTime': pd.to_datetime('2023-09-27 14:46:09'), 'Action': 'IN', 'LicenseCode': 'APP02', 'User': 'USER02'},
            {'DateTime': pd.to_datetime('2023-09-27 13:27:57'), 'Action': 'OUT', 'LicenseCode': 'APP02', 'User': 'USER01'},
            {'DateTime': pd.to_datetime('2023-09-27 18:03:09'), 'Action': 'IN', 'LicenseCode': 'APP02', 'User': 'USER01'}
]
df = pd.DataFrame(data)
df.groupby([pd.Grouper(key='DateTime', freq='D'),'User', 'LicenseCode'])

Hi guys, I'd like to create a new elapsed time column (in hours) for licenses from the sample data of license checkout. This is just a snippet of 500k+ lines.

Conditions for new column:

  • elapsed time will be grouped by day, user & license code as only elapsed time for each day will be tabulated for each unique user & licensecode

  • if df['Action'] == OUT, elapsed time = 0

  • if (df['Action'] == OUT) & (df['Action'].shift() == OUT), elapsed time = df[DateTime].diff()

  • if 'Action' == IN, elapsed time = calculate cumulative time diff of previous 'OUT' line (they work in pairs OUT then IN)

  • there are instances of OUT,OUT,IN,IN (order of 'Action' rows), most ideal will be OUT,IN,OUT,IN

I've hit a wall when I try to do a groupby agg with a custom function as I can't reference the columns by e.g. df['Action'].

Example of Desired Output :

DateTime Action LicenseCode User Elapsed_Time
2023-09-26 09:36:47 OUT APP01 USER01 0.00
2023-09-26 11:36:47 IN APP01 USER01 2.00
2023-09-27 09:30:00 OUT APP02 USER02 0.00
2023-09-27 14:30:00 IN APP02 USER02 5.00
2023-09-27 15:30:00 OUT APP02 USER02 5.00
2023-09-27 15:45:00 IN APP02 USER02 5.25
2023-09-27 16:10:00 OUT APP01 USER02 0.00
2023-09-27 16:40:00 IN APP01 USER02 0.50
2023-09-27 17:00:00 OUT APP01 USER02 0.50
2023-09-27 17:12:00 OUT APP01 USER02 0.70
2023-09-27 17:42:00 IN APP01 USER02 1.20
2023-09-27 17:52:00 IN APP01 USER02 1.37

Any thoughts will be greatly appreciated!

2

There are 2 best solutions below

0
Unsullied On BEST ANSWER

Thanks to Panda Kim, I've managed to retrieve the desired output. Here is the code modification from @Panda Kim's input.

Code:

data = [    
            {'DateTime': pd.to_datetime('2023-09-26 09:36:47'), 'Action': 'OUT', 'LicenseCode': 'APP01', 'User': 'USER01'},
            {'DateTime': pd.to_datetime('2023-09-26 11:36:47'), 'Action': 'IN', 'LicenseCode': 'APP01', 'User': 'USER01'},
            {'DateTime': pd.to_datetime('2023-09-27 09:30:00'), 'Action': 'OUT', 'LicenseCode': 'APP02', 'User': 'USER02'},
            {'DateTime': pd.to_datetime('2023-09-27 14:30:00'), 'Action': 'IN', 'LicenseCode': 'APP02', 'User': 'USER02'},
            {'DateTime': pd.to_datetime('2023-09-27 15:30:00'), 'Action': 'OUT', 'LicenseCode': 'APP02', 'User': 'USER02'},
            {'DateTime': pd.to_datetime('2023-09-27 15:45:00'), 'Action': 'IN', 'LicenseCode': 'APP02', 'User': 'USER02'},
            {'DateTime': pd.to_datetime('2023-09-27 16:10:00'), 'Action': 'OUT', 'LicenseCode': 'APP01', 'User': 'USER02'},
            {'DateTime': pd.to_datetime('2023-09-27 16:40:00'), 'Action': 'IN', 'LicenseCode': 'APP01', 'User': 'USER02'},
            {'DateTime': pd.to_datetime('2023-09-27 17:00:00'), 'Action': 'OUT', 'LicenseCode': 'APP01', 'User': 'USER02'},
            {'DateTime': pd.to_datetime('2023-09-27 17:12:00'), 'Action': 'OUT', 'LicenseCode': 'APP01', 'User': 'USER02'},
            {'DateTime': pd.to_datetime('2023-09-27 17:42:00'), 'Action': 'IN', 'LicenseCode': 'APP01', 'User': 'USER02'},
            {'DateTime': pd.to_datetime('2023-09-27 17:52:00'), 'Action': 'IN', 'LicenseCode': 'APP01', 'User': 'USER02'}
]

df = pd.DataFrame(data)
cond = (df['Action'].eq('OUT')) & (df['Action'].shift(+1).eq('IN'))
grp = cond.groupby([df['LicenseCode'],df['User']]).cumsum()
df['elapsed_time'] = df.groupby([pd.Grouper(key='DateTime', freq='D'), grp, 'User', 'LicenseCode'])['DateTime'].diff()
df['elapsed_time'] = df['elapsed_time'].fillna(pd.Timedelta(seconds=0))
df['elapsed_time'] = (df.groupby([pd.Grouper(key='DateTime', freq='D'),'User', 'LicenseCode']).elapsed_time.cumsum() \
    .dt.total_seconds()/3600)  \
    .round(2)

print(df)

Output: enter image description here

1
Panda Kim On

Your question is unclear. If you want the following output, use the following code. If the desired output is different from the following, please provide the desired output clearly as a text table.

Also, your example has a typo and cannot run it.

Code

cond = df['Action'].eq('OUT')
grp = cond.groupby(df['LicenseCode']).cumsum()
df['elapsed time'] = df.groupby([grp, 'LicenseCode'])['DateTime'].diff().mask(cond, 0)

df:

             DateTime Action LicenseCode    User     elapsed time
0 2023-09-26 09:36:47    OUT       APP01  USER01                0
1 2023-09-26 11:01:55     IN       APP01  USER01  0 days 01:25:08
2 2023-09-27 10:01:20    OUT       APP02  USER02                0
3 2023-09-27 14:46:09     IN       APP02  USER02  0 days 04:44:49
4 2023-09-27 13:27:57    OUT       APP02  USER01                0
5 2023-09-27 18:03:09     IN       APP02  USER01  0 days 04:35:12