In my scatter plot, I want to color and size each data point based on some criteria. Here is the example of what I am doing:
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
%matplotlib inline
fig, ax = plt.subplots()
#dataframe with two columns, serial number and the cooresponding fractures
df1 = pd.DataFrame({'SN':['A1', 'A2', 'A3', 'A4', 'A5', 'A6', 'A7'],
'Fracture': [6, 90, 35, 60, 48, 22, 6]})
#dataframe with two columns, serial number and the cooresponding force
df2 = pd.DataFrame({'SN':['A1', 'A2', 'A3', 'A4', 'A5', 'A6', 'A7'],
'Force': [200, 140, 170, 150, 160, 210, 190]})
df3 = pd.merge(df1, df2, on='SN')
df4 = df3.sort_values(by='Force')['Fracture']
#c
colors = ['g' if int(i) < 25 else 'yellow' if int(i) < 50 else 'orange' for i in df4]
size_map = []
for i, c in enumerate(df4):
if colors[i] == 'g':
size_map.append(max(1 - (c / 25) * 0.99, 0.05) * 200)
elif colors[i] == 'yellow':
size_map.append(max(1 - (c / 50) * 0.99, 0.05) * 200)
elif colors[i] == 'orange':
size_map.append(max(1 - (c / 100) * 0.99, 0.05) * 200)
wp = [0.1, 0.25, 0.35, 0.45, 0.55, 0.72, 0.9]
Y = df2['Force'].sort_values()
fig, ax = plt.subplots()
sns.scatterplot(x=Y, y=wp)
ax.collections[0].set_color(colors)
ax.collections[0].set_sizes(size_map)
I have tried different logics for the size map but none of them correctly calculates the size based on the provided criterial. I appreciate any input.
The approach below suggests some changes.
Put all data into a dataframe, don't work with separate lists. This is especially important when sorting is involved, as the dataframe will keep all columns in the newly sorted order.
Make use of seaborn's way for coloring. Seaborn uses a column as
hue=, together with a palette which maps each hue value to its corresponding color. In the code below, 0, 1 and 2 are used for the 3 groups.Make use of seaborn's way for setting the sizes. One column is used as
size=, where the values will be proportional to the dot sizes. Thesizes=(20, 200)parameter makes sure the smallestsizeis mapped to dot size 20 and the largest to dot size 200. You can set it tosize=(200, 20)to reverse the sizes (i.e., the smallestsizeto 200).The code below supposes you have 3 groups:
A
sizecolumn is created, which subtracts the start value of each group, and divides the last group by 2. That way all values of thesizecolumn are in the range 0 to 25.