How to generate an exponentially weighted and then exponentially smoothed correlation matrix dataframe in pandas?

69 Views Asked by At

I have a time series dataframe that I want to generate a smoothed correlation matrix on.

An example:

import numpy as np
import pandas as pd
import datetime as dt

np.random.seed(0)
df = pd.DataFrame(data=np.random.randn(100,3),columns=['Apple','Banana','Orange'],index=pd.date_range(start=dt.datetime(2023,1,1),periods=100))

Then I generate a rolling series of correlation matrices, exponentially weighted with a span of 20:

ewm_corr = df.ewm(span=20).corr()

Then I want to smooth the data across all the corresponding data points in each of the correlation matrices across time.

I expected that the following code would do that:

ewm_corr_smoothed = ewm_corr.ewm(span=20).mean()

However, it does not produce the data I expect. Here is what I expect for the Apple and Orange data points across time. First I extract the Apple and Orange correlation data points across time and then apply the smoothing:

ewm_corr.unstack()['Apple','Orange'].ewm(span=20).mean()

>>>

2023-01-01         NaN
2023-01-02         NaN
2023-01-03         NaN
2023-01-04         NaN
2023-01-05         NaN
                ...   
2023-04-06    0.017641
2023-04-07    0.025754
2023-04-08    0.037171
2023-04-09    0.047193
2023-04-10    0.058412
Freq: D, Name: (Apple, Orange), Length: 100, dtype: float64

If I then check the data from the first method, here is what I get:

ewm_corr_smoothed.unstack()['Apple','Orange']

>>>

2023-01-01         NaN
2023-01-02    0.265612
2023-01-03    0.396163
2023-01-04    0.360363
2023-01-05    0.348585
                ...   
2023-04-06    0.223486
2023-04-07    0.235869
2023-04-08    0.244261
2023-04-09    0.249865
2023-04-10    0.254088
Freq: D, Name: (Apple, Orange), Length: 100, dtype: float64

The data points are significantly different, so I assume the code is deploying a different calculation. I am trying to generate matrices of smoothed correlation data points across time in line with the example for the expected data i.e. where 2023-04-10 has the value 0.058412. I assume this must be possible.

I hope that is clear. Thanks!

Update:

It seems like this achieves the goal:

ewm_corr_smoothed = df.ewm(span=25).corr().unstack().ewm(span=25).mean().stack()

Now when I check, the Apple and Orange data, I get the data as expected:

ewm_corr_smoothed.unstack()['Apple','Orange']

>>>

2023-01-02   -1.000000
2023-01-03   -0.642511
2023-01-04   -0.659027
2023-01-05   -0.659796
2023-01-06   -0.599816
                ...   
2023-04-06    0.031091
2023-04-07    0.035798
2023-04-08    0.042619
2023-04-09    0.048757
2023-04-10    0.055900
Freq: D, Name: (Apple, Orange), Length: 99, dtype: float64

However, I remain interested in an explanation as to what pandas is actually doing in the first method. Thanks!

0

There are 0 best solutions below