I have a problem. I want to create a process with a heatmap. To see how long each step took.
I created the process with PyDot and created a dataframe for the individuall steps.
How could I create a heatmap for my process?
The calculation should be also include the from-step-to-step time.
So you can calculate the edges time e.g task1_start - start / task2_start - task1_end
And you can calculate the nodes time e.g. task1_end - task1_start / task2_end - task2_start.
My MVP only changes the color of the border. But I want to create a real heatmap.
Process
import pydot
from IPython.display import SVG
graph = pydot.Dot(graph_type='digraph')
task_node1 = pydot.Node("Task1", shape="box",)
task_node2 = pydot.Node("Task2", shape="box",)
graph.add_node(task_node1)
graph.add_node(task_node2)
task1_to_task2_edge = pydot.Edge("Task1", "Task2",)
graph.add_edge(task1_to_task2_edge)
graph.write_svg("diagram.svg")
SVG('diagram.svg')
Dataframe
id step timestamp
0 1 task1_start 2023-01-01
1 1 task1_End 2023-01-05
2 1 task2_start 2023-01-10
3 1 task2_end 2023-01-12
4 2 task1_start 2023-01-01
5 2 task1_End 2023-01-05
6 2 task2_start 2023-01-10
7 2 task2_end 2023-01-12
MVP
import pandas as pd
d = {'id': [1, 1, 1, 1,
2, 2, 2, 2,],
'step': ['task1_start', 'task1_End', 'task2_start', 'task2_end',
'task1_start', 'task1_End', 'task2_start', 'task2_end',],
'timestamp': ['2023-01-01', '2023-01-05', '2023-01-10', '2023-01-12',
'2023-01-01', '2023-01-05', '2023-01-10', '2023-01-12',]}
df = pd.DataFrame(data=d,)
df['timestamp'] = pd.to_datetime(df['timestamp'])
g = df.groupby('id')
out = (df
.assign(duration=df['timestamp'].sub(g['timestamp'].shift()),
step=lambda d: (df['step']+'/'+g['step'].shift()).str.replace(
r'([^_]+)[^/]*/([^_]+)[^/]*',
lambda m: m.group(1) if m.group(1)==m.group(2) else f"{m.group(2)}_to_{m.group(1)}",
regex=True)
)
[['id', 'step', 'duration']].dropna(subset=['duration'])
)
df = out
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.colors as mcolors
colors = mcolors.LinearSegmentedColormap.from_list(
'LightBlueGreenYellowRed', ['#B0E0E6', '#87CEEB', '#00FF00', '#ADFF2F', '#FFFF00', '#FFD700', '#FFA500', '#FF4500', '#FF0000', '#FF6347', '#FF7F50', '#FFA07A', '#FFC0CB', '#FFB6C1', '#FF69B4', '#DB7093', '#FF1493', '#C71585', '#FF00FF']
)
def get_color(value, vmin, vmax):
norm = (value - vmin) / (vmax - vmin)
cmap = colors(norm)
return mcolors.to_hex(cmap)
vmin = df['duration'].min()
vmax = df['duration'].max()
df['color'] = df['duration'].apply(lambda x: get_color(x, vmin, vmax))
def get_color(id):
if (df['step'] == id).any():
color = df.loc[df['step'] == id, 'color'].values[0]
if pd.isnull(color):
return '#808080'
else:
return color
else:
return '#808080'
import pydot
from IPython.display import SVG
graph = pydot.Dot(graph_type='digraph')
task_node1 = pydot.Node("Task1", shape="box", color = get_color('task1'))
task_node2 = pydot.Node("Task2", shape="box", color = get_color('task2'))
graph.add_node(task_node1)
graph.add_node(task_node2)
task1_to_task2_edge = pydot.Edge("Task1", "Task2", color = get_color('task1_to_task2'))
graph.add_edge(task1_to_task2_edge)
graph.write_svg("diagram.svg")
SVG('diagram.svg')




For drawing the heatmap, use the SVG export and add class names to the nodes to mark how hot they are. You then can include that SVG group twice and use a filter to give something like your heatmap, by having a background filled with colours and blurred and the normal black and white version as foreground.
It would be nice if you could just use a ref to the graph group twice rather than having it included twice, but I couldn't get a CSS expression to treat the group differently if it was used rather than inlined.