I am trying to create a correlation matrix to see which variables are useful from my dataset since there are over 600 variables.
I used df.corr() and received an error message that Python cound not convert string into a float. It was the date column. It is set up as YYYYmM or 2019m5 (2019 month 5(May). Do I just need to change the format? If so, how would I do that for the matrix to work?
Correlation can only be mathematically calculated with numerical data.
If you want to perform this calculation, choose all numerical data types using the code
I recommend starting your data visualization with scatterplot and heatmap!