Parallel inner loop, sequential outer: can I make mcfork faster or keep sessions somehow?

92 Views Asked by At

When writing a nested loop where the outer loop needs to be sequential and the inner loop depends on the previous iteration via a very simple object, is there a way to reduce the overhead from communicating between sessions multiple times?

Here's an example using doMC and foreach:

library(doMC)
library(tictoc)

registerDoMC(3)

tic()
a <- runif(100)
for(i in seq_len(333)){
  a <- foreach(j = 1:100, .combine = c) %dopar% {
    sqrt(a[j])*sqrt(max(a))
  } 
}
toc()
#> 7.669 sec elapsed

tic()
a <- runif(100)
for(i in seq_len(333)){
  a <- foreach(j = 1:100, .combine = c) %do% {
    sqrt(a[j])*sqrt(max(a))
  } 
}
toc()
#> 5.224 sec elapsed

Created on 2019-07-22 by the reprex package (v0.3.0)

Because the calculations in the inner loop are very simple the overhead introduced from creating multiple sessions and transfering the data make the parallel version slower than the sequential one.

After profiling the code, as expected, the time difference comes mostly from mcfork, but since the calculation is so simple and differs between calls only due to the a vector, I was wondering if there is a way to make the sessions persist between foreach calls and send only the a vector between each call (I think this should be a lot faster than forking the entire session).

Is there a way of achieving that with the session forking structure of doMC? Is there another parallel backend capable of this kind solution (making sessions persist with small changes between tasks)?

1

There are 1 best solutions below

0
Cole On

I'm not sure parallel computing can be recursive. So if result n+1 relies on result n, there's no way to be parallel without restructuring your algorithm.

For your specific problem, it wasn't truly recursive when max(a) > 0. Where a[j] == max(a), your inner loop would evaluate to max(a). That is:

sqrt(a[which.max(a)]) * sqrt(max(a)) == max(a)

So for each loop, sqrt(max(a)) will always be the same when max(a) > 0. Furthermore, we can work on simplifying the other portion of the loop:

#i = 1
sqrt(a[1]) * sqrt(max(a))
#i = 2
sqrt(sqrt(a[1]) * sqrt(max(a))) * sqrt(max(a))
#simplifies to
a[1]^(1/4) * max(a) ^ (3/4)

This loop can be translated into:

a^(1 / 2^n) * max(a)^((2^n - 1) / (2^n))

I know this doesn't directly answer your question but it supports @F. Privé's comment: try to rethink the problem to see if it can be vectorized. For example, even your example loop could be vectorized:

for(i in seq_len(n)){
  a <- sqrt(a) * sqrt(max(a)) 
  }

For seq_len(5), here is the performance of the methods:

Unit: microseconds
             expr      min       lq      mean   median       uq      max neval
  do_par_original 125089.5 128144.6 128089.30 128462.3 129352.3 129397.8     5
  inner_as_vector   3095.4   3183.0   9754.70   3288.2   3307.9  35899.0     5
 fully_vectorized     17.7     24.8     25.76     28.1     28.1     30.1     5