Pandas DataFrame rolling keeps giving NaN with mean and std

47 Views Asked by At

Can someone explain me why these rolling I'm performing give me always NaN? The rationale behind this code is to obtain some exogenous features for ARIMAX model from this dataframe computing mean and std on different lenght windows. I've copied part of this code online and one thing I'm not sure about is that min_periods=0 in the rolling method

import numpy as np
import matplotlib.pyplot as plt
import pmdarima as pm
import pandas as pd
import sklearn.metrics as sm
import yfinance as yf

data = yf.download("AAPL", start="2023-12-01", end="2024-02-14")
print()
data=pd.DataFrame(data)
rolling_features=data.columns.drop('Close')

exogenous_features=[]

def rollDF(df, rolling_features, p):
  df_rolling=df[rolling_features].rolling(window=p, min_periods=0)
  return df_rolling.mean().shift(1).reset_index().drop('Date', axis=1).astype(np.float64), df_rolling.std().shift(1).reset_index().drop('Date', axis=1).astype(np.float64)

lags=[3,7,30]

for i in lags:
  mean, std = rollDF(data, rolling_features, i)
  for j in mean.columns:
    data['mean'+j+str(i)+'d']=mean[j]
    exogenous_features.append('mean'+j+str(i)+'d')
  for j in std.columns: 
    data['std'+j+str(i)+'d']=std[j]
    exogenous_features.append('std'+j+str(i)+'d')
print()
print(data)
1

There are 1 best solutions below

2
TheHungryCub On BEST ANSWER

The problem might lie in the calculation of rolling mean and standard deviation for each column separately. You need to ensure that there are enough non-NaN values in each rolling window to compute these statistics accurately.

At least one observation is required to compute rolling statistics. for that you need to use min_periods=1.

Try below modification:

def rollDF(df, rolling_features, p):
    df_rolling_mean = df[rolling_features].rolling(window=p, min_periods=1).mean().shift(1)
    df_rolling_std = df[rolling_features].rolling(window=p, min_periods=1).std().shift(1)
    return df_rolling_mean, df_rolling_std
    
lags = [3, 7, 30]

for i in lags:
          mean, std = rollDF(data, rolling_features, i)
          for j in mean.columns:
              data['mean ' + j + str(i) + 'd'] = mean[j]
              exogenous_features.append('mean ' + j + str(i) + 'd')
          for j in std.columns: 
              data['std ' + j + str(i) + 'd'] = std[j]
              exogenous_features.append('std ' + j + str(i) + 'd')
print()
print(data)

My Output:

                  Open        High  ...  std Adj Close30d  std Volume30d
Date                                ...                                 
2023-12-01  190.330002  191.559998  ...               NaN            NaN
2023-12-04  189.979996  190.050003  ...               NaN            NaN
2023-12-05  190.210007  194.399994  ...          1.278243   1.619133e+06
2023-12-06  194.449997  194.759995  ...          1.995315   1.280725e+07
2023-12-07  193.630005  195.000000  ...          1.697768   1.177120e+07
2023-12-08  194.199997  195.990005  ...          1.892433   1.022310e+07
2023-12-11  193.110001  193.490005  ...          2.233501   9.328512e+06
2023-12-12  193.080002  194.720001  ...          2.045909   9.532984e+06
2023-12-13  195.089996  198.000000  ...          2.011138   8.841125e+06
2023-12-14  198.020004  199.619995  ...          2.495454   1.041538e+07
2023-12-15  197.529999  198.399994  ...          2.753220   1.068385e+07
2023-12-18  196.089996  196.630005  ...          2.820519   2.434283e+07
2023-12-19  196.160004  196.949997  ...          2.725366   2.326973e+07
2023-12-20  196.899994  197.679993  ...          2.696554   2.298150e+07
2023-12-21  196.100006  197.080002  ...          2.591104   2.216447e+07
2023-12-22  195.179993  195.410004  ...          2.496850   2.159995e+07
2023-12-26  193.610001  193.889999  ...          2.427323   2.151833e+07
2023-12-27  192.490005  193.500000  ...          2.380669   2.190624e+07
2023-12-28  194.139999  194.660004  ...          2.332131   2.131776e+07
2023-12-29  193.899994  194.399994  ...          2.275128   2.125616e+07
2024-01-02  187.149994  188.440002  ...          2.253568   2.083660e+07
2024-01-03  184.220001  185.880005  ...          2.897200   2.129497e+07
2024-01-04  182.149994  183.089996  ...          3.494620   2.079820e+07
2024-01-05  181.990005  182.759995  ...          4.176106   2.063667e+07
2024-01-08  182.089996  185.600006  ...          4.737533   2.023129e+07
2024-01-09  183.919998  185.149994  ...          4.838504   1.981696e+07
2024-01-10  184.350006  186.399994  ...          4.937670   1.958434e+07
2024-01-11  186.539993  187.050003  ...          4.965347   1.927504e+07
2024-01-12  186.059998  186.740005  ...          5.007363   1.894797e+07
2024-01-16  182.160004  184.259995  ...          5.024347   1.879791e+07
2024-01-17  181.270004  182.929993  ...          5.130975   1.858476e+07
2024-01-18  186.089996  189.139999  ...          5.352101   1.855963e+07
2024-01-19  189.330002  191.949997  ...          5.360888   1.889967e+07
2024-01-22  192.300003  195.330002  ...          5.339440   1.894475e+07
2024-01-23  195.020004  195.750000  ...          5.363793   1.874560e+07
2024-01-24  195.419998  196.380005  ...          5.387038   1.885497e+07
2024-01-25  195.220001  196.270004  ...          5.353115   1.885359e+07
2024-01-26  194.270004  194.759995  ...          5.371852   1.883676e+07
2024-01-29  192.009995  192.199997  ...          5.329716   1.894765e+07
2024-01-30  190.940002  191.800003  ...          5.154580   1.881364e+07
2024-01-31  187.039993  187.100006  ...          4.963838   1.868661e+07
2024-02-01  183.990005  186.949997  ...          4.863956   1.254534e+07
2024-02-02  179.860001  187.330002  ...          4.744780   1.273782e+07
2024-02-05  188.149994  189.250000  ...          4.564591   1.543684e+07
2024-02-06  186.860001  189.309998  ...          4.435463   1.566329e+07
2024-02-07  190.639999  191.050003  ...          4.295297   1.573131e+07
2024-02-08  189.389999  189.539993  ...          4.194081   1.535934e+07
2024-02-09  188.649994  189.990005  ...          4.103504   1.477813e+07
2024-02-12  188.419998  188.669998  ...          4.003911   1.484483e+07
2024-02-13  185.770004  186.210007  ...          3.872706   1.451160e+07

[50 rows x 36 columns]