I have a image-processing service containing two methods, which I want to execute in parallel using the multiprocessing library in Python.
The first-method does an api call in order to fetch image metadata from an external service.
The second method uses an object of a class which performs certain complex operations such as reading an image using the opencv library and also performing an image classification activity using a sklearn model
The first function looks like this (as shown below) -->
def function_1():
##perform long running api call
and this is my second function -->
def function_2(image_proc_obj):
predictions = image_proc_obj.predict()
On calling these two methods using multiprocessing.Process as shown below
image_proc_obj = ImageProcessingClass()
p1 = multiprocessing.Process(target=function_1)
p2 = multiprocessing.Process(target=function_2, args=(image_proc_obj,))
I am getting a ValueError: ctypes objects containing pointers cannot be pickled
I am passing the image_proc_obj in the second function because the constructor call of this class loads the model file which I don't want to happen on every function call.
I also tried creating a class in this manner by subclassing multiprocess.Process
class ImageClassifier(multiprocess.Process):
def __init__(self, process_obj):
super(ImageClassifier,self).__init__()
self.proc_obj = process_obj
def run(self, image):
predictions = self.proc_obj.predict(image)
But on running the commands as shown below:
image_proc_obj = ImageProcessingClass()
classifier = ImageClassifier(name="classifier process", process_obj=image_proc_obj)
classifier.start()
classifier.join()
I get the same error --> ValueError: ctypes objects containing pointers cannot be pickled
Looking forward to some help with this
I have designed and I am still successfully using the same process-to-process, low-latency optimised communication, using the below described strategy, having a minimum latency & shortest possible TAT in mind, as doing remote sklearn-
.predict()-s, and it works for production grade "remote"-predictions on sub-[ms]-sampled p2p-requests - this has worked for about six years like a charm.A :
Easy,
due to a wish to pass an object-instance into another, independent process, there is a need to prepare a so called serialised-representation of the original-process object.
Having created a SER-ialised / transferred / DES-erialised path, the original object data may get into the "remote"-process hands.
The transfer-tool may be a Python-native
Queue( which usespickle-tool for SER/DES ) or any other tool ( likenanomsg, ZeroMQ (pyzmq),pynng, raw-socktes' tools ), yet here one has to perform the SER/DES-transformations "programatically".Where
pickle.dump()was common to fail, there might be a chance to use a Mike McKearns'dill-pakage, just start toimport dill as pickleand may still letpickle.dump()in source codeNota bene:
once OpenCV payloads are under considerations, be warned, that Python-interpreter just points (refers) to an OpenCV native
mat::-memory, so better use anumpy-"flattened" re-representation of the image-data, if trying to move that to the other, "remote", process.