Separate the time between its days when the period days two or more days

34 Views Asked by At

I gave up after trying to separate a dataframe into different rows depending the days that the period takes place.

This is the dataframe and, as we can see, all rows start in a different day that they ends. 'Tiempo_ini' es when starts, 'Tiempo_fin' when it ends and 'Tiempo_diff' the difference between the previous both.

Tiempo_ini Valor Tiempo_diff Tiempo_fin Cambio
2018-01-02 21:51:47 1 0 days 12:46:30 2018-01-03 10:38:17 True
2018-01-03 22:11:17 1 0 days 07:46:00 2018-01-04 05:57:17 True
2018-01-04 22:22:17 1 0 days 08:38:00 2018-01-05 07:00:17 True
2018-01-05 07:01:17 1 1 days 13:01:30 2018-01-06 20:02:47 True
2018-01-06 20:30:17 1 0 days 04:27:00 2018-01-07 00:57:17 True

I would like to generate as many additional rows as required depending on the period specified in 'Tiempo_diff'. And being able to have in each line, the corresponding interval and 'Tiempo_diff' that belongs to its day.

first row should divide into two:

Tiempo_ini Valor Tiempo_fin Tiempo_diff Cambio
2018-01-02 21:51:47 1 2018-01-03 00:00:00 0 days 02:08:13 True
2018-01-03 00:00:00 1 2018-01-03 10:38:17 0 days 10:38:17 True

I tried with this code but, the problem I had is that when there is the second iteration, it modifies the values and I lose the changes.

I was trying with this code:

nuevos_registros = []

for index, row in df.iterrows():
    if row['Cambio'] == True:

        tiempo_ini = row['Tiempo_ini']
        tiempo_diff = row['Tiempo_diff']
        tiempo_fin = row['Tiempo_fin']

        
        while tiempo_diff > timedelta(seconds=0):
            
            tiempo_fin_original = row['Tiempo_fin']
            tiempo_diff_original = tiempo_diff
            tiempo_diff = ((tiempo_ini + timedelta(days=1)).replace(hour=0, minute=0, second=0) - tiempo_ini)

            nuevo_registro = {
                "Tiempo_ini": tiempo_ini,
                "Valor": row["Valor"],
                "Tiempo_fin": (tiempo_ini + timedelta(days=1)).replace(hour=0, minute=0, second=1),
                "Tiempo_diff": ((tiempo_ini + timedelta(days=1)).replace(hour=0, minute=0, second=0) - tiempo_ini),
                "Cambio": True
            }
            nuevos_registros.append(nuevo_registro)

            tiempo_diff = tiempo_diff_original - tiempo_diff
            tiempo_ini = (tiempo_ini + timedelta(days=1)).replace(hour=0, minute=0, second=0)
            tiempo_fin = tiempo_fin_original

       
    
    else:
        nuevo_registro = {
        "Tiempo_ini": row['Tiempo_ini'],
        "Valor": row["Valor"],
        "Tiempo_fin": row["Tiempo_fin"],
        "Tiempo_diff": row["Tiempo_diff"],
        "Cambio": False 
        }

        nuevos_registros.append(nuevo_registro)


df_mod = pd.DataFrame(nuevos_registros)

    
df_mod[df_mod['Cambio'] == True].head()

Which gives it to me:

Tiempo_ini Valor Tiempo_fin Tiempo_diff Cambio
2018-01-02 21:51:47 1 2018-01-03 00:00:01 0 days 02:08:13 True
2018-01-03 00:00:00 1 2018-01-04 00:00:01 1 days 00:00:00 True
2018-01-03 22:11:17 1 2018-01-04 00:00:01 0 days 01:48:43 True
2018-01-04 00:00:00 1 2018-01-05 00:00:01 1 days 00:00:00 True
2018-01-04 22:22:17 1 2018-01-05 00:00:01 0 days 01:37:43 True

I hope to have been clear. I am sorry for the inconvenience! I have been trying for long and I identified the problem but I do not know how to solve it.

Thank you very much in advance ^^

1

There are 1 best solutions below

0
jmateu888 On

finally, I found the right path which was also easier:

First of all, divide with as many rows as days.

df['start_date'] = df['Tiempo_ini'].dt.date
df['end_date'] = df['Tiempo_fin'].dt.date

df['explode_days'] = df.apply(lambda row: pd.date_range(row['start_date'], row['end_date'], freq= 'D'), axis = 1)
df = df.explode('explode_days')

Determine the differents options:

def combo_time(row):
    if row['start_date'] == row['end_date']:
        return 1
    
    elif (row['start_date'] == row['explode_days']) and (row['end_date'] != row['explode_days']):
        return 2
    
    elif (row['start_date'] != row['explode_days']) and (row['end_date'] == row['explode_days']):
        return 3
    
    else:
        return 4
df['Combinacion'] = df.apply(lambda row: combo_time(row), axis=1)

Get the right dates for each row

def comienzo(row):
    if row['Combinacion'] == 1:
        return row['Tiempo_ini']
    
    elif row['Combinacion'] == 2:
        return row['Tiempo_ini']
    
    elif row['Combinacion'] == 3:
        return row['Tiempo_fin'].replace(hour=0, minute=0, second=0) 
    
    else:
        return row['explode_days'].replace(hour=0, minute=0, second=0)

def final(row):
    if row['Combinacion'] == 1:
        return row['Tiempo_fin']
    
    elif row['Combinacion'] == 2:
        return (row['Tiempo_ini'] + timedelta(days=1)).replace(hour=0, minute=0, second=0) 
    
    elif row['Combinacion'] == 3:
        return row['Tiempo_fin']
    
    else:
        return (row['explode_days'] + timedelta(days=1)).replace(hour=0, minute=0, second=0)


df['Fe.comienzo'] = df.apply(lambda row: comienzo(row), axis=1)  
df['Fe.fin'] = df.apply(lambda row: final(row), axis=1)  

Obtain the corresponding charge for each day.

def calc_time(row):
    if row['start_date'] == row['end_date']:
        return row['Tiempo_diff']
    
    elif (row['start_date'] == row['explode_days']) and (row['end_date'] != row['explode_days']):
        return (row['Tiempo_ini'] + timedelta(days=1)).replace(hour=0, minute=0, second=0) - row['Tiempo_ini']
    
    elif (row['start_date'] != row['explode_days']) and (row['end_date'] == row['explode_days']):
        return row['Tiempo_fin'] - row['Tiempo_fin'].replace(hour=0, minute=0, second=0)
    
    else:
        return timedelta(days=1)
df['Time'] = df.apply(lambda row: calc_time(row), axis=1)

Use groupby to see the charge for each day:

df['Fe.comienzo'] = df['Fe.comienzo'].dt.date
df = df.groupby('Fe.comienzo')['Time'].sum().to_frame()
Fe.comienzo Time
2018-01-02 0 days 15:24:43
2018-01-03 0 days 22:39:00
2018-01-04 0 days 22:42:00
2018-01-05 0 days 23:59:00
2018-01-06 0 days 23:32:30
2018-01-07 0 days 21:03:00
2018-01-08 1 days 00:00:00

Perhaps there is an easier way. Thank you for your time.