OpenMP - Weird Result in Combination of parallel and SIMD namespaces

223 Views Asked by At

I have a C++ project which uses OpenMP, and in some place in the code I have #pragma omp simd nested inside #pragma omp parallel. There was a consistent crash in the code which happened only in multi-threaded runs compiled in debug mode (and not in release). I made a short reproducible code which exemplifies the problem -

#include <iostream>
#include <atomic>
#include <omp.h>

struct A {
    int z;
};

int main() {
    size_t size = 100;
    auto A_arr = new A*[size];

#pragma omp parallel
{
#pragma omp for schedule(dynamic)
    for (size_t x = 0; x < size; ++x) {
        A_arr[x] = new A{0};
    }
}

#pragma omp parallel
{
    A** begin = A_arr;

#pragma omp simd
    for (size_t x = 0 ; x < size ; ++x) {
        A* a = *begin;
        auto z = a->z;
        begin++;
    }
}
    delete[] A_arr;
    return 0;
}

Compiling this with icpc in debug mode runs just fine. But, if I change the SIMD loop to

#pragma omp simd
    for (size_t x = 0 ; x < size ; ++x) {
        A* a = begin[x];
        auto z = a->z;
    }
}

(which should be logically equivalent) the code suddenly crashes in debug mode compilation, and works fine in release mode.

I did a lot of debugging to change isolate the problematic part in the code, and I think the example I presented needs no further context.

I also tried using gdb (in the crash it sometimes claims that a is NULL , and sometimes it points to a location in the memory which cannot be read from), and valgrind (which ran successfully).

From searching online, I understand that that the SIMD vectorization doesn't happen in -O0, but apparently the SIMD loop claims still makes the debugger to make some assumptions regarding the loop spanning, which may explain the different results in debug and release modes.=

Of course what I described here solves the problem, but I wish to understand better what happens here, and whether there's a "missing bug" which I just hid deeper.

Thanks in advance!

0

There are 0 best solutions below