Python Multiprocessing with pydantic.BaseModel objects

69 Views Asked by At

I'm trying to process a pydantic object with the multiprocessing toolbox in Python.

My Task: I need to download many files. The url to these files an additional information are stored in an data object, like a boolean "file_downloaded". I created this data object with pydantic. Now I want to download more than one file at once. So I want to make a list of multiple data objects and process them in a Pool with 5 processes and I use the map-function for that.

Here is an simple example (with errors):

import pydantic
from typing import Optional
import multiprocessing
from multiprocessing.managers import BaseManager


class data_object(pydantic.BaseModel):
    url: str
    downloaded: Optional[bool] = False


class CustomManager(BaseManager):
    pass


def downloader(single_data: data_object):
    single_data.downloaded = True


if __name__ == '__main__':
    # Simple single process test for data_object and worker (no errors)
    just_one_object = data_object(url='url1')
    print(just_one_object.downloaded)
    downloader(just_one_object)
    print(just_one_object.downloaded)

    # Multiprocesses with shared data_object
    CustomManager.register('data_object', data_object)
    CustomManager.register('list', list)
    with CustomManager() as manager:
        shared_single_object = manager.data_object(url='url2')  # Error occurs
        print(shared_single_object.downloaded)
        downloader(shared_single_object)
        print(shared_single_object.downloaded)

        managed_list = manager.list([manager.data_object(url='url'+str(v)) for v in range(5)])

        pool = multiprocessing.Pool(processes=5)
        pool.map(downloader, managed_list)
        pool.close()
        pool.join()
        print(managed_list)

When I run this example, I get the following error in line of definition of shared_single_object:

AttributeError: '__signature__' attribute of 'data_object' is class-only

Unfortunately I have no idea, where to start to solve this error. In following I create multiple instances of the data object with different urls and list them in a managed list. Then they should be downloaded. Maybe there is another problem, I wasn't able to run this part.

I searched the internet for using the multiprocessing.manager for a pydantic object, but i found nothing. I tried used an example of sharing an complex class with the manager to implement the code above.

With just dictionary I was able to download multiple files at once, but I'd like to use pydantic.

Thanks in advance.

0

There are 0 best solutions below