Are there benefits to using DDP with pandarallel?

41 Views Asked by Alex Li At 29 November 2023 at 20:18

I have a ML task, and as a preprocessing step, there is a lot of work on CPU to be done which can take >1 hour. Before, I was using the pandarallel library to parallelize this work, the documentation says that it will use all CPUs. But when using pytorch DDP on slurm, now there are multiple (4) processes running. I can have just one process run as before, or I can split the csv into 4 partitions and then run each of those in parallel.

Is this second approach faster, or am I just redundantly parallelizing?

I know that DDP is multiprocess but I'm not sure about the pandarallel library. I do notice that I need to specify less workers per DDP process for pandarallel when running with DDP, else I will get an OOM error.

Original Q&A

Are there benefits to using DDP with pandarallel?

There are 0 best solutions below

Related Questions in PANDAS

Related Questions in PYTORCH

Related Questions in DISTRIBUTED-COMPUTING

Related Questions in SLURM

Related Questions in PANDARALLEL

Trending Questions

Popular # Hahtags

Popular Questions