I have a time series dataframe (1 min intraday stock price) and would like to convert it to higher timeframe. But the result of this transformation is not what I expected and I can't figure out why.
The time series dataframe:
| index | Date | Time | Open | High | Low | Close |
|---|---|---|---|---|---|---|
| 0 | 2022-06-16 | 09:30:00 | 14.570 | 14.62 | 14.500 | 14.61 |
| 1 | 2022-06-16 | 09:31:00 | 14.630 | 14.85 | 14.620 | 14.79 |
| 2 | 2022-06-16 | 09:32:00 | 14.800 | 14.93 | 14.760 | 14.82 |
| 3 | 2022-06-16 | 09:33:00 | 14.820 | 14.82 | 14.650 | 14.70 |
| 4 | 2022-06-16 | 09:34:00 | 14.670 | 14.68 | 14.610 | 14.67 |
| ... | ... | ... | ... | ... | ... | ... |
| 265 | 2022-06-16 | 13:55:00 | 14.090 | 14.10 | 14.080 | 14.09 |
| 266 | 2022-06-16 | 13:56:00 | 14.085 | 14.10 | 14.065 | 14.09 |
| 267 | 2022-06-16 | 13:57:00 | 14.090 | 14.12 | 14.090 | 14.10 |
| 268 | 2022-06-16 | 13:58:00 | 14.105 | 14.11 | 14.075 | 14.08 |
| 269 | 2022-06-16 | 13:59:00 | 14.070 | 14.09 | 14.060 | 14.06 |
I would like to convert the above data to 9 minutes timeframe. For this task I use the next function:
def convert_to_higher_timeframe(data, timeframe): # data - dataframe, timeframe - desirable timeframe
# Compute timestamp of each row
try:
datetime_col = pd.to_datetime(data['Date'] + ' ' + data['Time'].apply(lambda x: x.strftime('%H:%M:%S')))
except AttributeError:
datetime_col = pd.to_datetime(data['Date'] + ' ' + data['Time'])
# Set datetime column as index and resample to higher timeframe
data = data.set_index(datetime_col)
data = data.resample(f"{timeframe}T").agg({"Open": "first",
"High": "max",
"Low": "min",
"Close": "last"})
data.reset_index(inplace=True)
data['Date'] = data['index'].dt.strftime('%Y-%m-%d')
data['Time'] = data['index'].dt.time
data.drop(columns=['index'])
data = data[['Date', 'Time', 'Open', 'High', 'Low', 'Close']]
data.dropna(inplace=True)
return data
As result I expected to get the new dataframe first row of which represents the data for interval started at 09:30:00, the next one at 09:39:00 and so on. But insted I got the dataframe first row of which represents the data for interval started at 09:27:00 and each next one is plus 9 minutes. For other intervals (5, 6 minutes) all works great and the result is as expected. But why for 9 minutes timeframe the time shifts from 09:30 to 09:27 I can't figure out. Can anyone suggest what's going on here and why the time shifts when desirable timeframe is 9? Thanks in advance
I should have used the parameter origin in the function and set its value to 'start'. This is how: