AMD OpenCL load program from SPIR-V

395 Views Asked by At

I trying to load OpenCL program from SPIR-V. Program code is trivial, i.e. just two kernels with naive matrix to matrix multiplication i.e. BLAS like gemm and matrix transpose from nvidia examples, it looks like following.

#define BLOCK_SIZE 16

__kernel void gemmf(
    __global float* restrict ret, 
    const __global float* lhs, 
    const __global float* rhs, 
    const unsigned int M,
    const unsigned int N,
    const unsigned int K
    )
{
    // Thread identifiers
    const size_t i = get_global_id(0); // Row ID of C (0..M)
    const size_t j = get_global_id(1); // Col ID of C (0..N)
 
    // Compute a single element (loop over K)
    float acc = 0.0f;
    for (int k=0; k<K; k++) {
      acc += lhs[i * M + k] * rhs[k * K + j];
    }
 
    // Store the result
    ret[i * M + j] = acc;
}

__kernel void transposef(__global float *dst, const __global float* restrict src, const unsigned int width, const unsigned height, local float* block)
{
  // read the matrix tile into shared memory
  unsigned int x_index = get_global_id(0);
  unsigned int y_index = get_global_id(1);

  if((x_index < width) && (y_index < height)) {
    unsigned int index_in = y_index * width + x_index;
    block[get_local_id(1)*(BLOCK_SIZE+1)+get_local_id(0)] = src[index_in];
  }

  barrier(CLK_LOCAL_MEM_FENCE);

  // write the transposed matrix tile to global memory
  x_index = get_group_id(1) * BLOCK_SIZE + get_local_id(0);
  y_index = get_group_id(0) * BLOCK_SIZE + get_local_id(1);
  if((x_index < height) && (y_index < width)) {
    size_t index_out = y_index * height + x_index;
    dst[index_out] = block[get_local_id(0)*(BLOCK_SIZE+1)+get_local_id(1)];
  }

}

When I loading this program from source with clCreateProgramWithSource both kernels works as expected. However when I compiling it to SPRIR-V as described at Khronos offline compilation-of Opencl kernels into SPIR-V using open source tooling with

clang -c -target spir64 -O0 -emit-llvm -cl-std=CL1.2 -o blas_level3.bc blas_level3.cl
llvm-spirv blas_level3.b -o blas_level3.spv

and load the same program with clCreateProgramWithIL function returns success, as well as clBuildProgram also returns CL_SUCCESS, but clCreateKernel returns CL_INVALID_KERNEL_NAME and clCreateKernelsInProgram returns 0 kernels.

clGetPlatformInfo with CL_PLATFORM_VERSION shows OpenCL 2.1 AMD-APP (3380.6), and clDeviceInfo with CL_DEVICE_IL_VERSION shows 1.4

My question is - how to correctly check that SPIR-V file contains correct export table? Can it be AMD OpenCL specific implementation issue?

P.S. AMD OpenCL implementation also failing with clGetProgramBuildInfo, when program compilation failed and I just asking for log size in bytes. Same works perfectly for Intel and NVIDIA. (OS Windows 10 64 bit, GPU Radeon RX 6600 XT)

0

There are 0 best solutions below