Does parallellization in R copy all data in the parent process?

122 Views Asked by At

I have some large bioinformatics project where I want to run a small function on about a million markers, which takes a small tibble (22 rows, 2 columns) as well as an integer as input. The returned object is about 80KB each, and no large amount of data is created within the function, just some formatting and statistical testing. I've tried various approaches using the parallel, doParallel and doMC packages, all pretty canonical stuff (foreach, %dopar% etc.), on a machine with 182 cores, of which I am using 60.

However, no matter what I do, the memory requirement gets into the terabytes quickly and crashes the machine. The parent process holds many gigabytes of data in memory though, which makes me suspicious: Does all the memory content of the parent process get copied to the parallelized processes, even when it is not needed? If so, how can I prevent this?

Note: I'm not necessarily interested in a solution to my specific problem, hence no code example or the like. I'm having trouble understanding the details of how memory works in R parallelization.

0

There are 0 best solutions below