Lets say I have a myfunc that takes in an input i and j, calculates their sum and populates a passed-in array with the answer. This is what the problem looks like:
import numpy as np
from functools import partial
from multiprocessing import Pool
def myfunc(i: int, j: int, some_array: np.ndarray):
ans = i+j
some_array[i,j] = ans
some_array = np.zeros(shape = (2,2))
execute = partial(myfunc, some_array = some_array)
for i in range(2):
for j in range(2):
execute(i,j)
print(some_array)
[[0. 1.]
[1. 2.]]
Now, lets imagine I would like to parallelize this code. I do so in the following way:
iter = [(i,j) for i in range(2) for j in range(2)]
with Pool() as p:
p.starmap(execute, iterable = iter)
This doesn't update the empty array everytime execute is called with different args. The final array is all zeros. This may be because p.starmap yields a list of all results at the end but given that execute is called for each iterable it should execute some_array[i,j] = ans in every call.
Any ideas/ help is much appreciated.
The biggest issue here is that separate processes have separate memory, so when
executeis called, it is with a different copy ofsome_array. The copy ofsome_arrayin the main process is then never updated, and the result is un-changed (all zeros). There are two ways around this: message passing, and shared memory. Most of themultiprocessing.Poolfunctions already have some sort of mechanism toreturna value from the target function which operates via pickling the results and sending them back to the main process with aQueue. The benefit of this is that it's quite flexible, and can handle many types of data (anything that can be pickled). The downside is that you then have to re-assemble the data in the main process, and there's quite a bit of overhead in the sending process (slow). The other solution is to ask the operating system to carve out a chunk of memory that both processes can access. This is quite fast, but it takes a little setup (and cleanup), and it is limited to data that can be represented by a binary buffer. Fortunatelynumpycan create an array from an existing binary buffer (our shared memory block). I have a helper class I have tweaked over time to help make the bookkeeping of the shared memory easier: