I am attempting to run a process in parallel using dask.bag, but the process is taking longer than the task stream seems to suggest.
- I am on dask version 2023.9.3
- I am on a single machine
start = time.time()
def combine_shader_polygons(i):
shader_polygon = None
shader_indices = np.flatnonzero(shading_candidates_np[i])
if len(shader_indices) == 0:
pass
elif len(shader_indices) == 1:
shader_polygon = reference_gdf.loc[shader_indices].iloc[0]
else:
polygons = reference_gdf.loc[shader_indices]
shader_polygon = polygons.unary_union
return shader_polygon
shader_polygons = bag.map(combine_shader_polygons).compute(scheduler='processes')
timer = round(time.time() - start, 2)
print(f'Checkpoint 1: {timer}s')
As you can see in the task stream image below, the process from start to finish takes around 350ms. But the print statement returns 5.3s. Is there a way to see what is taking up the rest of the time?
