I've had issues with %dopar% (with a doMC backend) and fwrite() intermittently on my Debian 9 PC and a Red Hat 4.8.5 high performance computing cluster. The behavior isn't always consistent, but, in short, fwrite and foreach malfunction--even when specifically requesting that fwrite only use one core (either via setDTthreads or nThread in fwrite) so as to avoid weirdness with the parallel workers. In the most recent case, no files were written in one instance, and in another couple only a one to four of the 12 files were written. What's more, foreach does not return correctly. fwrite is not the last line in the foreach loop; after that it is meant to return something, but instead it returns NULL. Meanwhile, no errors are detected--the exit status on the cluster is 0, and no warnings are printed.
If one uses the sequential %do%, it works as expected (and multiple threads can be used for fwrite). I'm not sure that I could provide a reproducible example, as the results are not consistent. For the latest behavior, I was using R 3.3.3, data.table 1.10.4-3, doMC 1.3.5, and foreach 1.4.4. I experienced the same issues using the latest release of data.table v1.10.5 from GitHub. The issue appears to be related to file size; it tends to work correctly with relatively small files, but once one approaches .3GB and up, it messes up. Indeed, the same code works when writing smaller files (subsets of the same data.tables).
I don't know that this is really the best place to post this--I'm only asking this here because the GitHub page specifies that I should before filing an issue. So, has anyone had this issue, and does anyone know a way around it (i.e., how to preserve the multicore parallel loop whilst using single-threaded fwrite with large datasets)?
It appears this is due to a memory handling issue, possibly unique to
qsubbatch job submissions. Specifically, if the amount allocated topmemin aqsubcall is insufficient, weird things will happen. In some new work, I found that, for example, some lines were arbitrarily written incorrectly (and thus the CSVs could not be read). Meanwhile, if one uses the same amount of memory, but specifies it viamem(i.e.,pmem*ppn),fwriteandforeach%dopar%work correctly.