Deadlock between tbb::task_group and thread barrier

48 Views Asked by At

In my application i am using Intel OneAPI TBB for parallelism. I am isolating work on NUMA nodes using task_arenas. I have a couple of parallel tasks I need to execute on each NUMA node for which I am using task_groups. For some of these tasks it is necessary to synchronize with the other NUMA nodes.

I am trying to do this with the std::barrier. When I test my implementation the execution sometimes deadlocks and sometimes it doesn't and I don't really understand why this happens.

Here is an example that reproduces the problem:

#include <oneapi/tbb/info.h>
#include <oneapi/tbb/task_arena.h>
#include <oneapi/tbb/task_group.h>

#include <barrier>

using namespace oneapi;

int main(int argc, char* argv[]) {
    std::vector<tbb::numa_node_id> numa_indexes = tbb::info::numa_nodes();
    std::vector<tbb::task_arena> arenas(numa_indexes.size());
    std::vector<tbb::task_group> task_groups(numa_indexes.size());

    std::barrier barrier(numa_indexes.size());
    
    for (unsigned j = 0; j < numa_indexes.size(); j++) {
        arenas[j].initialize(tbb::task_arena::constraints(numa_indexes[j]));
    }

    for (int i = 0; i < 10000; ++i) {
        for (unsigned j = 0; j < numa_indexes.size(); j++) {
                arenas[j].execute([&task_groups, &barrier, j]() {
                task_groups[j].run([&barrier]() { barrier.arrive_and_wait(); });
            });
        }

        for (unsigned j = 0; j < numa_indexes.size(); j++) {
            arenas[j].execute([&task_groups, j]() { task_groups[j].wait(); });
        }
    }
}

When I run the same code without the task_groups (calling barrier.arrive_and_wait() directly inside arenas[j].execute()) the code works just as expected. But without the use of task_groups it is not possible to wait on the completion of specific tasks.

I also tried other synchronization approaches which lead to the same result.

Can someone explain to me why this happens?

I am using oneTBB 2021.9.0 and gcc 13.1.0.

0

There are 0 best solutions below