def replace_inf(df):
all_columns = list(df.columns)
no_infs = ['some_col', 'some_col']
inf_cols = [c for c in all_columns if c not in no_infs]
replace = [np.nan, np.inf, -np.inf]
for col in inf_cols:
df[col] = df[col].replace(replace, 0, regex=True)
df[col] = df[col].astype(np.float32)
Currently this is taking 3s for a subset of my columns and many times that for all columns. I think map, apply, lambda and vectorize could help but I'm having trouble writing something that works.
You can try numba and parallelize the task:
Benchmark (using dataframe 5000 columns/100 rows - with random 10 NaNs/10 inf/10 -inf in each column):
Prints on my computer (AMD 5700x):
With 50_000 columns/100 rows:
With 150_000 columns/100 rows: