I have Pandas DataFrame with two columns: CATEGORY (1-400, discrete, categorical) and RESPONSE (0.0-1.0, continuous):
CATEGORY RESPONSE
33 0.000
5 0.005
101 0.125
102 0.423
3 0.003
6 0.75
... etc 55k rows
I first group the DataFrame by category and get the array of RESPONSE for each 1-400 CATEGOR-ies.
I want to calculate Pearson correlation coefficient between arrays for all CATEGORY pairs and show it as, say heatmap, with CATEGORY on horizontal and vertical axes and Pearson value as a color/intensity.
Alternatively, I would like to make a 2D histogram RESPONSE-vs-CATEGORY, binning RESPONSE in 10 bins with width 0.1, and recalculatong the Pearson coefficients.
Google-ing, I cannot find how one goes from 2 column pandas DataFrame to 2D histogram that could be saved.
Pandas has a built-in function to calculate correlations, pandas.DataFrame.corr. Pearson is the default method for this.
The example from the documentation is similar to what you want to do:
Turning the correlation matrix into a heatmap works very well with seaborn, see stackoverflow. Alternatively, you can format the dataframe using pandas to colorize the different cells according to their value.