I trying to load OpenCL program from SPIR-V. Program code is trivial, i.e. just two kernels with naive matrix to matrix multiplication i.e. BLAS like gemm and matrix transpose from nvidia examples, it looks like following.
#define BLOCK_SIZE 16
__kernel void gemmf(
__global float* restrict ret,
const __global float* lhs,
const __global float* rhs,
const unsigned int M,
const unsigned int N,
const unsigned int K
)
{
// Thread identifiers
const size_t i = get_global_id(0); // Row ID of C (0..M)
const size_t j = get_global_id(1); // Col ID of C (0..N)
// Compute a single element (loop over K)
float acc = 0.0f;
for (int k=0; k<K; k++) {
acc += lhs[i * M + k] * rhs[k * K + j];
}
// Store the result
ret[i * M + j] = acc;
}
__kernel void transposef(__global float *dst, const __global float* restrict src, const unsigned int width, const unsigned height, local float* block)
{
// read the matrix tile into shared memory
unsigned int x_index = get_global_id(0);
unsigned int y_index = get_global_id(1);
if((x_index < width) && (y_index < height)) {
unsigned int index_in = y_index * width + x_index;
block[get_local_id(1)*(BLOCK_SIZE+1)+get_local_id(0)] = src[index_in];
}
barrier(CLK_LOCAL_MEM_FENCE);
// write the transposed matrix tile to global memory
x_index = get_group_id(1) * BLOCK_SIZE + get_local_id(0);
y_index = get_group_id(0) * BLOCK_SIZE + get_local_id(1);
if((x_index < height) && (y_index < width)) {
size_t index_out = y_index * height + x_index;
dst[index_out] = block[get_local_id(0)*(BLOCK_SIZE+1)+get_local_id(1)];
}
}
When I loading this program from source with clCreateProgramWithSource both kernels works as expected. However when I compiling it to SPRIR-V as described at Khronos offline compilation-of Opencl kernels into SPIR-V using open source tooling with
clang -c -target spir64 -O0 -emit-llvm -cl-std=CL1.2 -o blas_level3.bc blas_level3.cl
llvm-spirv blas_level3.b -o blas_level3.spv
and load the same program with clCreateProgramWithIL function returns success, as well as clBuildProgram also returns CL_SUCCESS, but clCreateKernel returns CL_INVALID_KERNEL_NAME and clCreateKernelsInProgram returns 0 kernels.
clGetPlatformInfo with CL_PLATFORM_VERSION shows OpenCL 2.1 AMD-APP (3380.6), and clDeviceInfo with CL_DEVICE_IL_VERSION shows 1.4
My question is - how to correctly check that SPIR-V file contains correct export table? Can it be AMD OpenCL specific implementation issue?
P.S. AMD OpenCL implementation also failing with clGetProgramBuildInfo, when program compilation failed and I just asking for log size in bytes. Same works perfectly for Intel and NVIDIA. (OS Windows 10 64 bit, GPU Radeon RX 6600 XT)