Double Pointer in Metal

510 Views Asked by At

I'm trying to run a Metal compute shader on an array of arrays. I thought that because we use pointers to provide arrays to the shader, the logical solution would be to use a double pointer when working on a 2D array.

My kernel function is in the form:

kernel void foo(device float** array2D [[buffer(0)]], uint2 pid [[thread_position_in_grid]]) {
    ...
}

When I use device float** array2D [[buffer(0)]] as a parameter in the Metal shader function, I get an error:

Invalid address space qualification for buffer pointee type 'device float *'

I'm assuming this is because the compiler is interpreting device float * as the address space qualification (which is not valid). My question is: How do I tell the compiler that I'm using a double pointer? and if I can't use double pointers for some reason, What's a good workaround for working on a 2D array?

Sidenote: I know that it's possible to combine the elements in all the arrays into one array, do the computation on the single array and then divide the array back up into its smaller arrays. However, this is really inefficient and takes a long time. (I'm working with big arrays hence the use of Metal).

1

There are 1 best solutions below

0
Chip Jarred On

Updated answer

In comments (and revised question) you clarify that you can't combine them into a flat array. I don't have an answer that I know will work, but rather something to try.

Create an array of MTLCommandQueues. The idea is to create and commit separate MTLCommandBuffers on different MTLCommandQueues in parallel using DispatchQueue.async.

I'm sure there's some limit on the number of command queues you can create so you can probably use the completion handler for each command buffer to chain to creating another one for an inner array waiting to be processed.

Original answer

I think you need to put your arrays in one flat array.

Assuming the inner arrays are the same length, and that pid is the "row" and "column" of the value you want to do your computation for, you would pass in the array length as one of your parameters to the shader:

kernel void foo(device float* array2D [[buffer(0)]], device uint rowLen [[buffer(1)]], uint2 pid [[thread_position_in_grid]]) {
    doSomeComputation(array2D[rowLen * pid.y + pid.x]); 
}

If the arrays are different lengths, you'll need to pass in an array of offsets into the array2D for the starts of all the rows.

kernel void foo(device float* array2D [[buffer(0)]], device uint* rowOffsets [[buffer(1)]], uint2 pid [[thread_position_in_grid]]) {
    doSomeComputation(array2D[rowOffsets[pid.y] + pid.x]); 
}