Lengths of overlapping time ranges listed by rows

Question

Lengths of overlapping time ranges listed by rows

168 Views Asked by usr_lal123 At 10 January 2024 at 04:45

I am using pandas version 1.0.5

The example dataframe below lists time intervals, recorded over three days, and I seek where some time intervals overlap every day.

For example, one of the overlapping time across all the three dates (yellow highlighted) is 1:16 - 2:13. The other (blue highlighted) would be 18:45 - 19:00

So my expected output would be like: [57,15] because

57 - Minutes between 1:16 - 2:13.
15 - Minutes between 18:45 - 19:00

Please use this generator of the input dataframe:

import pandas as pd
dat1 = [
    ['2023-12-27','2023-12-27 00:00:00','2023-12-27 02:14:00'],
    ['2023-12-27','2023-12-27 03:16:00','2023-12-27 04:19:00'],
    ['2023-12-27','2023-12-27 18:11:00','2023-12-27 20:13:00'],
    ['2023-12-28','2023-12-28 01:16:00','2023-12-28 02:14:00'],
    ['2023-12-28','2023-12-28 02:16:00','2023-12-28 02:28:00'],
    ['2023-12-28','2023-12-28 02:30:00','2023-12-28 02:56:00'],
    ['2023-12-28','2023-12-28 18:45:00','2023-12-28 19:00:00'],
    ['2023-12-29','2023-12-29 01:16:00','2023-12-29 02:13:00'],
    ['2023-12-29','2023-12-29 04:16:00','2023-12-29 05:09:00'],
    ['2023-12-29','2023-12-29 05:11:00','2023-12-29 05:14:00'],
    ['2023-12-29','2023-12-29 18:00:00','2023-12-29 19:00:00']
       ]
df = pd.DataFrame(dat1,columns = ['date','Start_tmp','End_tmp'])
df["Start_tmp"] = pd.to_datetime(df["Start_tmp"])
df["End_tmp"] = pd.to_datetime(df["End_tmp"])

Original Q&A

There are 2 best solutions below

sammywemmy On 10 January 2024 at 11:13

I assume you are only interested in the hours and minutes, and not the entire timestamp. If my assumption is correct, you can execute a range join with conditional_join from pyjanitor to get your results:

# pip install pyjanitor
import janitor
import pandas as pd
A=(df
   .assign(
       start=df.Start_tmp.dt.hour.mul(60)+ df.Start_tmp.dt.minute,
       end=df.End_tmp.dt.hour.mul(60)+df.End_tmp.dt.minute)
   )
(A
.conditional_join(
    A,
    ('start','start','>'),
    ('end','end','<'),
    df_columns=['start','end'],
    right_columns=None)
.pipe(lambda df: df.end-df.start)
)

0    15
1    57
dtype: int32

**OCa** · Accepted Answer · 2024-01-11T13:51:02.323000

This solution uses:

numpy, no uncommon Python modules, so using Python 1.0.5 you should, hopefully, be in the clear,
no nested loops to care for speed issues with growing dataset,

Method:

Draw the landscape of overlaps
Then select the overlaps corresponding to the number of documented days,
Finally describe the overlaps in terms of their lengths

Number of documented days: (as in Python: Convert timedelta to int in a dataframe)

n = 1 + ( max(df['End_tmp']) - min(df['Start_tmp']) ).days
n
3

Additive landscape:

# initial flat whole-day landcape (height: 0)
L = np.zeros(24*60, dtype='int')
# add up ranges: (reused @sammywemmy's perfect formula for time of day in minutes)
for start, end in zip(df['Start_tmp'].dt.hour.mul(60) + df['Start_tmp'].dt.minute,  # Start_tmp timestamps expressed in minutes
                      df['End_tmp'].dt.hour.mul(60)   + df['End_tmp'].dt.minute):   # End_tmp timestamps expressed in minutes
    L[start:end+1] += 1

plt.plot(L)
plt.hlines(y=[2,3],xmin=0,xmax=1400,colors=['green','red'], linestyles='dashed')
plt.xlabel('time of day (minutes)')
plt.ylabel('time range overlaps')

(Please excuse the typo: these are obviously minutes, not seconds)

Keep only overlaps over all days: (red line, n=3)

# Reduce heights <n to 0 because not overlaping every day
L[L<n]=0
# Simplify all greater values to 1 because only their presence matters
L[L>0]=1
# Now only overlaps are highlighted
# (Actually this latest line is disposable, provided we filter all but the overlaps of rank n. Useful only if you were to include lower overlaps)

Extract overlap ranges and their lengths

# Highlight edges of overlaping intervals
D = np.diff(L)
# Describe overlaps as ranges
R = list(zip([a[0]   for a in np.argwhere(D>0)],  # indices where overlaps *begin*, with scalar indices instead of arrays
             [a[0]-1 for a in np.argwhere(D<0)])) # indices where overlaps *end*, with scalar indices instead of arrays
R
[(75, 132), (1124, 1139)]

# Finally their lengths
[b-a for a,b in R]

Final output: [57, 15]

Lengths of overlapping time ranges listed by rows

There are 2 best solutions below

Related Questions in PYTHON

Related Questions in PANDAS

Related Questions in DATAFRAME

Related Questions in NUMPY

Related Questions in PYTHON-DATETIME

Trending Questions

Popular # Hahtags

Popular Questions