I have a set of points stored as (x,y) values. My goal is the map these onto a coordinate plane and create a continuous PDF distribution.
I would like to apply polygons to the figure and get the total summed probability under the shape and return that to the user. The shape would be stored as a series of coordinates, so [(0,0), (1,0), (1,1),(0,1),(0,0)] would represent a square.
So far, I have plotted the points using a seaborn.kdeplot, and that generates a beautiful plot, which the sum of every point adds to around 100%.
However, I am struggling to effectively apply shapes directly the the graph. I've tried a few online solutions, but have been unable to find any method to get the cumulative probability under a shape directly on the kde plot. My code is below:
def get_field_matrix(csv_name, coordinates):
# ... Some code that loads in a csv and names it df_heatmap
# df_heatmap has two columns, ActualX and ActualY which are the series of points for the kde
# Create a KDE-based heatmap using seaborn
kde = sns.kdeplot(x='ActualX', y='ActualY', data=df_heatmap, cmap='viridis', fill=True, cbar=True, weights=np.linalg.norm(df_heatmap, axis=1) ** .3)
# Set labels and title
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.title('KDE-based Heatmap of X, Y Coordinates')
# Coordinates is the list of tuples that make up the shape to be applied.
# Any shape may be represented, but it will always be a shape draw with a single line
# Creates a matplotlib path out of the coordinates
shape_path = Path(coordinates)
shape_patch = PathPatch(shape_path, lw=0)
plt.gca().add_patch(shape_patch)
print("Summed Probability over the shape:", "A")
# Set aspect ratio to be equal
plt.gca().set_aspect('equal', adjustable='box')
# Show the plot
plt.show()
return kde
Is there some library function I am missing to apply the cdf?
The example below shows how you can place a patch over a KDE plot, and integrate the area underneath.
I don't think the KDE data from
seaborn(left) is directly accessible, so I've run a separate KDE (right) that tries to matchseaborn.The green points represent the user-defined coordinates, and the hatched area is just to confirm that the patch has been correctly interpolated onto the same grid as the KDE data.
A KDE estimator is first fitted to the data, and then used to get a KDE estimate over a fine grid. The user-defined patch coordinates are used to build a triangulated surface, which is then interpolated onto the fine rectangular grid from before. The patch and KDE are now in the same space. They are finally multiplied and integrated over.