I am using rayon's .par_bridge().for_each(|r| { do_work(r) } ) to run tasks in parallel for some iterator (specifically: Records from a bed file, but I don't think that matters). There could be up to ~700.000 tasks.
I want to print (stdout or to a file) the results of every call to do_work(), but do this printing only in the order of the original iterator. I could sort all output after all parallel jobs have been completed, but storing all results until the end will require much more memory. I could add .enumerate() to have an index for each item and print out the first one when it is done, storing the rest until it is their turn, but I am not sure how to best implement such a system, or if it is the best solution at all. What would you suggest?
As @ChayimFriedman mentioned, this isn't necessarily feasible because
rayonlikes to subdivide work starting in large chunks, so the order won't be friendly. However, because you are using.par_bridge(), Rayon must take items from theIteratorin order, so the order will be close to the original order. Therefore, it is feasible to recover the original order using a buffer and.enumerate(), without consuming large amounts of memory.Here is a demonstration program.
The
for_each_with()transfers items from Rayon control to the channel, and therecover_order()function consumes the channel to callemit()with the items in proper order.The use of
rayon::scope()andspawn()allows thefor_each_with()parallel iteration to run “in the background” on the existing Rayon thread pool, so that the current thread can handle the receiving directly.