Unexpected result of resampling time series dataframe

28 Views Asked by At

I have a time series dataframe (1 min intraday stock price) and would like to convert it to higher timeframe. But the result of this transformation is not what I expected and I can't figure out why.

The time series dataframe:

index Date Time Open High Low Close
0 2022-06-16 09:30:00 14.570 14.62 14.500 14.61
1 2022-06-16 09:31:00 14.630 14.85 14.620 14.79
2 2022-06-16 09:32:00 14.800 14.93 14.760 14.82
3 2022-06-16 09:33:00 14.820 14.82 14.650 14.70
4 2022-06-16 09:34:00 14.670 14.68 14.610 14.67
... ... ... ... ... ... ...
265 2022-06-16 13:55:00 14.090 14.10 14.080 14.09
266 2022-06-16 13:56:00 14.085 14.10 14.065 14.09
267 2022-06-16 13:57:00 14.090 14.12 14.090 14.10
268 2022-06-16 13:58:00 14.105 14.11 14.075 14.08
269 2022-06-16 13:59:00 14.070 14.09 14.060 14.06

I would like to convert the above data to 9 minutes timeframe. For this task I use the next function:

def convert_to_higher_timeframe(data, timeframe):  # data - dataframe, timeframe - desirable timeframe
    # Compute timestamp of each row
    try:
        datetime_col = pd.to_datetime(data['Date'] + ' ' + data['Time'].apply(lambda x: x.strftime('%H:%M:%S')))
    except AttributeError:
        datetime_col = pd.to_datetime(data['Date'] + ' ' + data['Time'])
    # Set datetime column as index and resample to higher timeframe
    data = data.set_index(datetime_col)    
    data = data.resample(f"{timeframe}T").agg({"Open": "first",
                                               "High": "max",
                                               "Low": "min",
                                               "Close": "last"})
    data.reset_index(inplace=True)
    data['Date'] = data['index'].dt.strftime('%Y-%m-%d')
    data['Time'] = data['index'].dt.time
    data.drop(columns=['index'])
    data = data[['Date', 'Time', 'Open', 'High', 'Low', 'Close']]
    data.dropna(inplace=True)    
    return data 

As result I expected to get the new dataframe first row of which represents the data for interval started at 09:30:00, the next one at 09:39:00 and so on. But insted I got the dataframe first row of which represents the data for interval started at 09:27:00 and each next one is plus 9 minutes. For other intervals (5, 6 minutes) all works great and the result is as expected. But why for 9 minutes timeframe the time shifts from 09:30 to 09:27 I can't figure out. Can anyone suggest what's going on here and why the time shifts when desirable timeframe is 9? Thanks in advance

1

There are 1 best solutions below

0
user22544082 On

I should have used the parameter origin in the function and set its value to 'start'. This is how:

data = data.resample(f"{timeframe}T", origin='start').agg({"Open": "first",
                                               "High": "max",
                                               "Low": "min",
                                               "Close": "last"})