Correlation Matrix with over 100 variables

2.2k Views Asked by At

I have 100 Variables and I am trying to plot these variables in a correlation matrix. However as you can see on the picture I have too many variables to have a good visual presentation. Is there a graphical presentation which only shows the relevant correlated variables like above the threshold of 0.5?

This is the code I used:

import numpy as np # Data manipulation  
import matplotlib.pyplot as plt 
import seaborn c = df_train_new.corr() 
plt.figure(figsize=(20,20)) 
seaborn.heatmap(c, cmap='RdYlGn_r', mask = (np.abs(c) >= 0.5)) 
plt.show()

Correlation Matrix

1

There are 1 best solutions below

2
On

Remove the diagonal duplication. Remove highly correlated pairs. drop columns > .95

mask=np.triu(np.ones_like(corr,dtype=bool))

sns.heatmap(corr,  cmap=cmap, center=0, linewidths=1, annot=True, fmt=".2f",mask=mask)
plt.show()

removing highly correlated features


-1 and 1 and 0

drop features that are close to 1 or -1

tri_df=corr_matrix.mask(mask)

to_drop=[c for c in tri_df.columns if any(tri_df[c]>0.95)]