I am using snowfall for parallel processing on my desktop machine without any problems. I want to use several spare machines in the office to increase speed of the computations and want to distribute the parallel processing over the remote machines.
I have it working, but am experiencing an issue where distributing over the remote is much slower than either just running locally, or by running the code manually (i.e locally) on the other (remote) machine.
My tests show that the parallel processing over the remote cluster is about 14 times slower than running locally. I realize that there will be some overhead with distributing over the network, but thought it wouldn't be this inefficient.
Example Speed Test
testFun <- function(i, n=1E4) rnorm(n)
nsim <- 2000
Local machine - without parallel processing
st <- Sys.time()
temp <- sapply(1:nsim, testFun)
Sys.time() - st
# about 1.5 seconds
Local machine - with parallel processing
library(snowfall)
snowfall::sfInit(parallel = TRUE, cpus = 8)
st <- Sys.time()
temp <- snowfall::sfSapply(1:nsim, testFun)
Sys.time() - st
# snowfall 1.84-6.1 initialized (using snow 0.4-2): parallel execution on 8 CPUs.
# Time difference of ~0.9 secs
Remote machine - run directly on remote machine
testFun <- function(i, n=1E4) rnorm(n)
nsim <- 2000
library(snowfall)
snowfall::sfInit(parallel = TRUE, cpus = 8)
st <- Sys.time()
temp <- snowfall::sfSapply(1:nsim, testFun)
Sys.time() - st
# snowfall 1.84-6.1 initialized (using snow 0.4-2): parallel execution on 8 CPUs.
# Time difference of ~ 1 sec
Local machine - distributed to remote
remote <- list(host = "14*.***.***.***",
rscript = "Rscript",
snowlib = "/usr/local/lib/R/site-library",
rshcmd = "plink.exe -pw *********",
user="******",
master = '14*.***.***.***')
sfInit(parallel = TRUE, cpus = 8, type = "SOCK",
socketHosts = list(remote,remote,remote,remote,
remote, remote, remote, remote))
st <- Sys.time()
tt <- snowfall::sfSapply(1:nsim, testFun)
Sys.time() - st
# snowfall 1.84-6.1 initialized (using snow 0.4-2): parallel execution on 8 CPUs.
# Time difference of ~ 14 secs
My function is obviously more computationally intensive than this, and has multiple calls to sfSapply. I notice about the same increase in run time when distributing to the remote machine - about 12 - 15 times longer than when run directly in R on remote machine.
Questions
Am I doing something stupid? Is there a better way to do this?
Or is this due to the network communication between the local and remote machines and unavoidable?
Local machine: Windows 10; R 3.4.3 Remote machine: Ubuntu 17.10; R 3.4.3