How can I control what is sent to workers when using foreach and doMC

103 Views Asked by At

I do not understand what foreach is exporting when using the doMC backend.

Is there a way to control what is shipped to the workers ?

The following example seem to suggest that everything in the global environment is shipped, which is causing me trouble when I use 100+ Gb tables.

library("foreach")                                                                                                             
library("doMC")                                                                                                                
registerDoMC(3L)                                                                                                               
x <- 1                                                                                                                         
y <- 2                                                                                                                         
foo <- function(x) x+1                                                                                                         
cat(sprintf("hostname: %s, pid: %s, Objects are: %s\n", Sys.getenv("HOSTNAME"), Sys.getpid(), paste(ls(), collapse = ",")))    

foreach(task = 1:3, .noexport = ls()) %dopar%                                                                                  
{                                                                                                                              
    cat(sprintf("hostname: %s, pid: %s, Objects are: %s\n", Sys.getenv("HOSTNAME"), Sys.getpid(), paste(ls(), collapse = ",")))
    foo(task)                                                                                                                  
} -> res                                                                                                                       
print(res)   

Result: How is foo known to the workers ??

hostname: nyzls604m, pid: 372957, Objects are: foo,res,x,y
hostname: nyzls604m, pid: 385492, Objects are: task       
hostname: nyzls604m, pid: 385493, Objects are: task       
hostname: nyzls604m, pid: 385494, Objects are: task       
[[1]]                                       
[1] 2                                       

[[2]]                                       
[1] 3                                       

[[3]]                                       
[1] 4                                       
0

There are 0 best solutions below