Lately I have been using ProcessPoolExecutor for accelerating the processing of some functions I wrote.
I have a question regarding one function I would like to accelerate.
This function
def thefunction(input_file, output_file, somepar)
Involves opening and reading the input file, processing it and writing the results in a output file.
Right now I am doing
lista=glob.glob(os.path.join(args.thefolders,'path/this.json'))
for filen in lista:
print("Processing ",filen)
thefunction(filen,None,args.somepar)
I would like to do some multiprocess mapping like
with ProcessPoolExecutor() as process_pool:
work_done=list(process_pool.map(partial(thefunction,somepar=args.somepar),lista))
But I am a bit worried since the function involves I/O
Provided that the files accessed are different for every member of the list, is it safe to do the above use?
If the files are different, IO operations from different processes at once are completely reasonable.
If the files are the same, such an operation is unsafe and would require to use a synchronization primitive such as a lock, which would render the multiprocessing inefficient.