Python multithreading I/O operation

62 Views Asked by At

I am trying to multithread a simple application using python. It consists in loading a large amount of data from a format, treating it and saving it infos from this in a csv format with the help of panda. Basically here is a symbolic code for it:

def read(time):
 #reagind it from a I/O lib format

def treat(time):
 #performing scipy operations

def write(time):
 #write it thanks to pandas
 df.to_csv(f'partial_{time}_data.csv')

def thread(time):
  read(time)
  load(time)
  write(time)

if __name__ = "__main__":
 schedule = [ list of times to load, treat and write ]
 #single thread version
 for time in schedule:
  thread(time)

 #pooling version
 from tqdm.contrib.concurrent import process_map
 process_map(thread, schedule, max_workers=16)


IIUC process_map from tqdm uses concurrent.futures.ProcessPoolExecutor under the hood. I was surprised to gain only a 2x on the observed execution time running it on a 8-core Intel processor. Is there a more clever way to leverage multiprocessing resources here ?

Thanks for the help

0

There are 0 best solutions below