Are conflicts possible when writing multiple separate files using foreach R parallel?

52 Views Asked by At

Since I cannot provide a reproducible example I will give you a description. I need to process some thousand images with an image editing software (i.e. Rawtherapee). In essence, I am using foreach in R to parallelize the use of Rawtherapee so that each worker generated by R will use command-line version of Rawtherapee by callying shell. The steps are as follow:

  1. I created a list with N elements in which I stored the file paths of the images. N is the number of cores I will use. The name of each file is modified by adding a tag which allows to assign it to a specific iteration of foreach.
  2. each worker generated by foreach receive the chunk of the list of files that correspond to its number. And it creates the command to be passed to rawtherapee-cli
  3. the worker call shell and start processing with rawtherapee-cli: the image is read, modified and then saved.

To give you an idea, my call to foreach is like this:

nc = cpus
cl = makeCluster(nc, type = "PSOCK") 
cl
registerDoParallel(cl)

foreach(i= seq_along(set_split), .combine='c',
                .inorder=FALSE, .errorhandling='remove') %dopar% {
                 
script to generate the command for the cli
shell(cmd_string, intern=F,wait=T)

}

I'm on a Windows 10 machine and shell will call cmd by default (as far as I understand).

As you can understand, all the workers will have to write multiple files (i.e., each image they process), separately, on the hard drive.

My code works apparently fine. However I just realized that, at the end of the job, I was missing some images. The number of missing files were more or less similar in each chunk, and in different positions in the sequence of paths, with no apparent pattern.

I inspected and tested my code thoroughly and I can't see a bug.

My question is: is it possible that if 2 workers try to write their own file at the same time - or almost - they have some sort of conflict that prevent one of the 2 to be written (although it is not the same file name)?

If yes, how can this problem be addressed?

Thanks.

0

There are 0 best solutions below