OpenCL dynamic parallelism enqueue_kernel() functionality

56 Views Asked by At

I am trying to use the functionality provided by OpenCL 2.0 to call kernels from within kernels but cannot seem to get it working.

For instance I have these kernels:

__kernel void test2(){
    printf("test2");
}
__kernel void test1(){
    void (^my_func)(void) = ^{test2();};
    printf("test1");
    enqueue_kernel(get_default_queue(), CLK_ENQUEUE_FLAGS_NO_WAIT, ndrange_1D(1, 1), my_func);
}

then I call test1 using :

    size_t globalWorkSize = 1;
    clEnqueueNDRangeKernel(queue, kernel, 1, NULL, &globalWorkSize, NULL, 0, NULL, NULL);

However it seems as though test1 now doesn't execute at all, printf("test1"); doesn't print anything, if I comment out the enqueue_kernel() call it does. The kernel compiles with no errors and I am using the "-cl-std=CL2.0" flag, very strange. What am I missing ? Thanks in advance.

I am running the AMD 3.0 OpenCL SDK on Windows with a 7900XT.

If I try and declare an ndrange_t variable for example the kernel runs just fine so I am assuming everything is set up correctly, yet the enqueue_kernel call seemingly bricks the whole thing as nothing else gets executed before or after the call. If I try and print the out the output of enqueue_kernel, again doesn't work, I am assuming something bad happens at runtime, this really baffles me.

1

There are 1 best solutions below

0
Iordan Bogdan On

I found out what was wrong, apparently you also need another queue sperate from the one you use to call the kernel which spawns other kernels and this queue has to be created with clCreateCommandQueueWithProperties specifically and it needs to have the proper flags to allow queues on devices. For some reason no error codes signal the absences of this queue which is why this was so difficult to debug. I guess the kernel would crash at runtime since there was no queue.

So for instance adding the following lines fixed my issue :

cl_queue_properties properties[] = { CL_QUEUE_PROPERTIES, (cl_command_queue_properties)(CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE | CL_QUEUE_ON_DEVICE | CL_QUEUE_ON_DEVICE_DEFAULT), 0 };
cl_command_queue device_queue = clCreateCommandQueueWithProperties(context, device, properties, &err);

device_queue isn't actually explicitly used to launch anything, it just has to be created.