Using ProccessPoolExecutor for functions with I/O

31 Views Asked by At

Lately I have been using ProcessPoolExecutor for accelerating the processing of some functions I wrote.

I have a question regarding one function I would like to accelerate.

This function

def thefunction(input_file, output_file, somepar)

Involves opening and reading the input file, processing it and writing the results in a output file.

Right now I am doing

    lista=glob.glob(os.path.join(args.thefolders,'path/this.json'))

    for filen in lista:
        print("Processing ",filen)
        thefunction(filen,None,args.somepar)

I would like to do some multiprocess mapping like

with ProcessPoolExecutor() as process_pool:
    work_done=list(process_pool.map(partial(thefunction,somepar=args.somepar),lista))

But I am a bit worried since the function involves I/O

Provided that the files accessed are different for every member of the list, is it safe to do the above use?

1

There are 1 best solutions below

0
Bharel On

If the files are different, IO operations from different processes at once are completely reasonable.

If the files are the same, such an operation is unsafe and would require to use a synchronization primitive such as a lock, which would render the multiprocessing inefficient.