How can I correct a dataframe with a datetime column so that the timezones are correctly added including DST changes? (Python)

60 Views Asked by At

The dataframe I have has a column without timezones. But due to DST changes I had some problems datamanipulation. These problems resulted from having 'missing' rows when the clock got changed foreward and 'duplicate' rows when the clock changes back.

I wrote a function to add the timezone which automaticaly takes DST into acount.


def fix_dutch_clock_shift(df):
    df_fixed = df.copy()
    pd.to_datetime(df_fixed['datetime'])  

    amsterdam_tz = pytz.timezone('Europe/Amsterdam')

    for index, row in df_fixed.iterrows():
        dt = row['datetime']
        
        dt = amsterdam_tz.localize(dt)
        
        df_fixed.at[index, 'datetime'] = dt
    
    return df_fixed

This worked perfectly for the clock changing foreward changing the timezone from +01:00 to +02:00 at the correct datetime, but for the clock going backward I still got duplicates.

Datetime, Long, Short 
2022-10-30 02:00:00+02:00,61.58,61.58
2022-10-30 02:15:00+02:00,49.98,49.98
2022-10-30 02:30:00+02:00,26.72,26.72
2022-10-30 02:45:00+02:00,-111.32,-111.32
2022-10-30 02:00:00+02:00,-111.32,-111.32
2022-10-30 02:15:00+02:00,-130.19,-130.19
2022-10-30 02:30:00+02:00,-6.69,-6.69
2022-10-30 02:45:00+02:00,-130.19,-130.19

But I want the timezone to shift back from +02:00 to +01:00 the second time it hits 2022-10-30 02:00.

Datetime, Long, Short
2022-10-30 02:00:00+02:00,61.58,61.58
2022-10-30 02:15:00+02:00,49.98,49.98
2022-10-30 02:30:00+02:00,26.72,26.72
2022-10-30 02:45:00+02:00,-111.32,-111.32
2022-10-30 02:00:00+01:00,-111.32,-111.32
2022-10-30 02:15:00+01:00,-130.19,-130.19
2022-10-30 02:30:00+01:00,-6.69,-6.69
2022-10-30 02:45:00+01:00,-130.19,-130.19
1

There are 1 best solutions below

2
FObersteiner On BEST ANSWER

IIUC, your datetime data has a UTC offset (+02:00) over the whole year, which should be +01:00 or +02:00, depending on DST being active or not.

You can potentially achieve this by removing the offset first, then setting the correct time zone:

# to datetime, then remove UTC offset (+02:00)
df["dt"] = pd.to_datetime(df["Datetime"]).dt.tz_localize(None)

# set time zone and infer dst transition times
df["dt"] = df["dt"].dt.tz_localize("Europe/Amsterdam", ambiguous="infer")

df
                    Datetime    Long   Short                         dt
0  2022-10-30 02:00:00+02:00   61.58    61.58 2022-10-30 02:00:00+02:00
1  2022-10-30 02:15:00+02:00   49.98    49.98 2022-10-30 02:15:00+02:00
2  2022-10-30 02:30:00+02:00   26.72    26.72 2022-10-30 02:30:00+02:00
3  2022-10-30 02:45:00+02:00 -111.32  -111.32 2022-10-30 02:45:00+02:00
4  2022-10-30 02:00:00+02:00 -111.32  -111.32 2022-10-30 02:00:00+01:00
5  2022-10-30 02:15:00+02:00 -130.19  -130.19 2022-10-30 02:15:00+01:00
6  2022-10-30 02:30:00+02:00   -6.69    -6.69 2022-10-30 02:30:00+01:00
7  2022-10-30 02:45:00+02:00 -130.19  -130.19 2022-10-30 02:45:00+01:00