I am trying to parallelize the processing of problems contained in a vector. To do so I would like to first try to avoid std::thread, given that the std provides parallel execution methods which should cover my use case. My understanding of how to apply it is this:
std::vector<SolutionT> solution_vector;
std::mutex mut;
std::for_each(std::execution::par, problems.begin(), problems.end(), [&](const auto& problem) {
auto solution = do_heavy_work(problem)
std::lock_guard guard(mut);
solution_vector.emplace_back(solution);
}
In CMake I also add the threads lib to the executable (although unsure if necessary?):
find_package(Threads REQUIRED)
target_link_libraries(
executable
PRIVATE
Threads::Threads
)
The solving time for a problem ranges from mere milliseconds to tens of seconds and since I have a thousand problems in my vector and most are of the larger kind, so I am expecting parallelism to speed up this processing time a fair bit.
Yet, when I run the code I do not observe any threads being spawned. I check the processes with htop and observe only a single core being pushed to 100% while the rest are idle.
Am I wrong in expecting threads to be spawned in combination with std::execution::par? Am I missing a step to get this right?
This was tested on an 18-core Intel x86_64 Ubuntu 22.04 platform with GCC 11.4 and c++20.
Edit: Here is a godbolt example that reflects my usage: https://godbolt.org/z/EEMEWr3Wa
According to execution_policy
See the lines of the generated code from your link (lines 228 of the generated code and further).
Instead of spawning threads it does vectorization.
It could be another good question why vectorization applied with
std::execution::par.The recommended reading here might include this paper.