I'm using OpenMP target offloading do offload some nested loops to the gpu. I'm using the nowait to tun it asynchronous. This makes it a task. With the same input values the result differs from the one when not offloading (e.g. cpu: sum=0.99, offloading sum=0.5).
When removing the nowait clause it works just fine. So I think the issue is that it becomes an OpenMP task and I'm struggling getting it right.
#pragma omp target teams distribute parallel for reduction( +: sum) collapse(2) nowait depend(in: a, b) depend(out: sum)
for (int i = 1; i <= n; i++)
{
for (int j = 1; j <= n; j++)
{
double c = 0;
for (int k = 0; k < n; k++)
{
c += /* some computation */
}
sum += fabs(c);
}
}
The OpenMP 5.2 specification states:
This means that your code is executed in a task with a possibly deferred execution (with
nowait). Thus, it can be executed at the end of the parallel in the worst case, but always before all the dependent tasks andtaskwaitdirectives waiting for the target task (or the ones including a similar behaviour liketaskgroup). Because of that, you need not to modify the working arrays (nor release them) during this time span. If you do, the behaviour is undefined.You should especially pay attention to the correctness of synchronization points and task dependencies in your code (it is impossible for us to check that with the current incomplete provided code).