Pandas Apply Rolling Window to Every Column Efficiently

Question

Pandas Apply Rolling Window to Every Column Efficiently

52 Views Asked by Charles0349 At 20 July 2023 at 16:44

I'm running a Monte Carlo simulation. Part of the calculation requires applying a function to a rolling window for each simulation. However, I don't know how to do that efficiently. I'm concerned that this may be a duplicate post, but I was unable to find another like this.

My minimal reproducible example is this:

import pandas as pd
import numpy as np
from scipy.stats import norm

# Number of simulations
trials = 10000

# Generate random variables
df1 = pd.DataFrame(norm.rvs(size = (500, trials)))

f = lambda x: np.sum(x > 0) > 20

# Make a deep copy of df1
df2 = df1.copy(deep = True)

for col in df2.columns:
    
    df2[col] = df2[col].rolling(window = 30).apply(f)

Is there a way to write this without a for-loop or list comprehension? Since the simulations are columns, df2 would ideally have at least 10,000 columns. Having a data frame that's the transpose of this would also be fine. Within my code this part takes about 100 times more than the next longest process in my simulation.

Original Q&A

There are 1 best solutions below

**rpanai** · Accepted Answer · 2023-07-20T22:12:54.120000

Your is not exactly a minimal example. So we first try to see what happen with a minimal example and then we test performaces on the full one.

Data

import pandas as pd
import numpy as np
from scipy.stats import norm

# Number of simulations
trials = 10000

# Generate random variables
df1 = pd.DataFrame(norm.rvs(size = (500, trials)))

Min example

here I both reduce the amount of data and change your function to work with less data

df_min = df1[range(3)][:10]
# backup
df_min_bk = df_min.copy()
f_min = lambda x: np.sum(x > 0) > 2

where df_min is

          0         1         2
0  0.407418  1.741455 -0.270929
1 -0.530294  1.248405  1.201781
2 -1.193793 -0.088235  0.991222
3 -0.941380  0.499053 -0.913778
4  0.951970 -2.073895 -1.179818
5 -1.666666  1.143326  1.266971
6  0.688032 -0.188798 -0.130474
7  0.618970 -0.595450  1.420563
8  1.370329 -0.904624  1.167164
9 -0.571588  0.547064 -1.169145

Run minimal example

Using apply

%%time
for col in df_min.columns:
    
    df_min[col] = df_min[col].rolling(window=3).apply(f_min)

CPU times: user 10.6 ms, sys: 693 µs, total: 11.3 ms
Wall time: 11 ms

and the output is

     0    1    2
0  NaN  NaN  NaN
1  NaN  NaN  NaN
2  0.0  0.0  0.0
3  0.0  0.0  0.0
4  0.0  0.0  0.0
5  0.0  0.0  0.0
6  0.0  0.0  0.0
7  0.0  0.0  0.0
8  1.0  0.0  0.0
9  0.0  0.0  0.0

Avoid apply

Set df_min = df_min_bk.copy() and using built-in function we can rewrite the same function as

for col in df_min.columns:
    
    df_min[col] = df_min[col].gt(0).rolling(window=3).sum().gt(2).astype(int)

CPU times: user 1.02 ms, sys: 3.21 ms, total: 4.24 ms
Wall time: 3.9 ms

Which is almost 3x the previous case and the output is still

which is ok if we remember that the first n-1 columns of a rolling windows should be NaN.

Avoid loop columns

Set again df_min = df_min_bk.copy() we can use the precious function without looping columns

%%time
df_min = df_min.gt(0).rolling(window=3).sum().gt(2).astype(int)

CPU times: user 2.21 ms, sys: 0 ns, total: 2.21 ms
Wall time: 2.22 ms

Which is almost 2x the precious case and 6X the apply one. The output is the same as the previous example.

Full Example

%%time
df2 = df2.gt(0).rolling(window=30).sum().gt(20).astype(int)

CPU times: user 607 ms, sys: 27 ms, total: 634 ms
Wall time: 633 ms

this takes less than a second. While the apply and the loop through columns is taking several minutes

Timing apply

CPU times: user 8min 40s, sys: 150 ms, total: 8min 40s
Wall time: 8min 40s

Compared to the previous method the speedup is 820x.

Conclusion

Play first with a small amount of data you can visualize, then eventually play with few full columns and then with all the data.

Pandas Apply Rolling Window to Every Column Efficiently

There are 1 best solutions below

Data

Min example

Run minimal example

Using apply

Avoid apply

Avoid loop columns

Full Example

Timing apply

Conclusion

Related Questions in PYTHON

Related Questions in PANDAS

Related Questions in CUSTOM-ROLLING-WINDOW

Trending Questions

Popular # Hahtags

Popular Questions