Passing structure to raw kernel in cupy

1.6k Views Asked by At

I have CUDA kernels that take structures such as float3, int2, etc. as arguments. I seem to be unable to properly pass an argument to these kernels through the cupy rawkernel interface. I have tried passing a 1d cupy array of 3 floats for a float3 parameter and the argument was not interpreted correctly in the kernel. I have tried passing a ctypes structure, but got back an unsupported type error. Is it possible to to send a custom struct to a raw kernel in cupy? If so, how?

I tried using ctype structs as follows:

class float3(ctypes.Structure): 
    fields = [ ("X", c_float), ("Y", c_float), ("Z", c_float)] 

from cupy.cuda.function import CPointer 

class CFloat3(CPointer): 
    def __init__(self, v): super().__init__(ctypes.addressof(v)) 
        self.val = v 

val= float3(1.5, 3, 5) 
cval= CFloat3(val) 

This bypassed cupy's type checking, but still didn't correctly pass the values to the kernel. It seems like it should work if you check look at the function module in cupy's source code. It just passes on the pointer of the struct. I also tried id(v) and ctypes.POINTER(float3)(v) instead of ctypes.addressof to get the address of the struct, but that didn't work either.

I can work around this by writing kernel wrappers that accept arrays as inputs and then convert the arrays to structs to call my regular kernels. This is ugly to me though. If this can't be done, it seems like a big oversight to not provide the ability to pass structs to kernels.

2

There are 2 best solutions below

3
Robert Crovella On BEST ANSWER

I'm in agreement with the comment; I was not able to find a way to make this work in the general case.

A hacky method can perhaps be used to work with float2 or double2 by repurposing the np.complexXX datatypes. Here is an example:

$ cat t19.py
import numpy as np
import cupy
ddim = 64
bdim = 32
d = np.complex64(1+2j)
i = cupy.ones((ddim*3), dtype=cupy.float32).reshape(ddim, 3)
o = cupy.zeros((ddim*3), dtype = cupy.float32).reshape(ddim, 3)
my_test = cupy.RawKernel(r'''
  extern "C" __global__
  void my_test(const float2 d, const  float3 * __restrict__  i, float3 * __restrict__ o, int dim) {
  int x = blockDim.x * blockIdx.x + threadIdx.x;
  if (x < dim){
    float3 temp = i[x];
    temp.x += d.x;
    temp.y += d.y;
    temp.z += d.x;
    o[x] = temp;}
  }
 ''', 'my_test')
gdim = ddim//bdim + 1
my_test((gdim,1), (bdim,1), (d, i,o,ddim))  # grid, block and arguments
r_o = cupy.asnumpy(o)
print(r_o)
$ python t19.py
[[2. 3. 2.]
 [2. 3. 2.]
 [2. 3. 2.]
 [2. 3. 2.]
 [2. 3. 2.]
 [2. 3. 2.]
 [2. 3. 2.]
 [2. 3. 2.]
 [2. 3. 2.]
 [2. 3. 2.]
 [2. 3. 2.]
 [2. 3. 2.]
 [2. 3. 2.]
 [2. 3. 2.]
 [2. 3. 2.]
 [2. 3. 2.]
 [2. 3. 2.]
 [2. 3. 2.]
 [2. 3. 2.]
 [2. 3. 2.]
 [2. 3. 2.]
 [2. 3. 2.]
 [2. 3. 2.]
 [2. 3. 2.]
 [2. 3. 2.]
 [2. 3. 2.]
 [2. 3. 2.]
 [2. 3. 2.]
 [2. 3. 2.]
 [2. 3. 2.]
 [2. 3. 2.]
 [2. 3. 2.]
 [2. 3. 2.]
 [2. 3. 2.]
 [2. 3. 2.]
 [2. 3. 2.]
 [2. 3. 2.]
 [2. 3. 2.]
 [2. 3. 2.]
 [2. 3. 2.]
 [2. 3. 2.]
 [2. 3. 2.]
 [2. 3. 2.]
 [2. 3. 2.]
 [2. 3. 2.]
 [2. 3. 2.]
 [2. 3. 2.]
 [2. 3. 2.]
 [2. 3. 2.]
 [2. 3. 2.]
 [2. 3. 2.]
 [2. 3. 2.]
 [2. 3. 2.]
 [2. 3. 2.]
 [2. 3. 2.]
 [2. 3. 2.]
 [2. 3. 2.]
 [2. 3. 2.]
 [2. 3. 2.]
 [2. 3. 2.]
 [2. 3. 2.]
 [2. 3. 2.]
 [2. 3. 2.]
 [2. 3. 2.]]
$

I had no luck with numpy structured data types, which seems like it would have been the logical path for this.

0
emcastillo On

Thanks for your question.

A (hackish) way to solve the issue with float2, float3 types is to cast the cupy array pointer inside the kernel (However, this is not really recommended):

import cupy
add_kernel = cupy.RawKernel(r'''
    extern "C" __global__
    void my_add(const float* x1, float* y) {
        int tid = blockDim.x * blockIdx.x + threadIdx.x;
        float3* xf3 = (float3*) x1;
        y[tid] = xf3->x + xf3->y + xf3->z;
    }
    ''', 'my_add')
x1 = cupy.array([1, 2, 3], dtype='float32')
y = cupy.array([0], dtype='float32')
add_kernel((1,), (1,), (x1, y))

However, CuPy does not support structured arrays so it is not possible to map a cupy array to a user-defined structure in cuda kernels.