Can someone explain me why these rolling I'm performing give me always NaN? The rationale behind this code is to obtain some exogenous features for ARIMAX model from this dataframe computing mean and std on different lenght windows. I've copied part of this code online and one thing I'm not sure about is that min_periods=0 in the rolling method
import numpy as np
import matplotlib.pyplot as plt
import pmdarima as pm
import pandas as pd
import sklearn.metrics as sm
import yfinance as yf
data = yf.download("AAPL", start="2023-12-01", end="2024-02-14")
print()
data=pd.DataFrame(data)
rolling_features=data.columns.drop('Close')
exogenous_features=[]
def rollDF(df, rolling_features, p):
df_rolling=df[rolling_features].rolling(window=p, min_periods=0)
return df_rolling.mean().shift(1).reset_index().drop('Date', axis=1).astype(np.float64), df_rolling.std().shift(1).reset_index().drop('Date', axis=1).astype(np.float64)
lags=[3,7,30]
for i in lags:
mean, std = rollDF(data, rolling_features, i)
for j in mean.columns:
data['mean'+j+str(i)+'d']=mean[j]
exogenous_features.append('mean'+j+str(i)+'d')
for j in std.columns:
data['std'+j+str(i)+'d']=std[j]
exogenous_features.append('std'+j+str(i)+'d')
print()
print(data)
The problem might lie in the calculation of rolling mean and standard deviation for each column separately. You need to ensure that there are enough non-NaN values in each rolling window to compute these statistics accurately.
At least one observation is required to compute rolling statistics. for that you need to use
min_periods=1.Try below modification:
My Output: