Reciprocal of fp16 in OpenCL

61 Views Asked by At

In my OpenCL kernel I use 16bit floating point values of type half from the cl_khr_fp16 extension.

Although this gives me code that works well, I noticed with AMD's radeon developer tools that the reciprocal is computed in 32 bits (gpu target is gfx1102 RDNA3.)

Radeon GPU Analyzer

So the value is first converted from half precision to single precision, then the reciprocal is computed, and then the result is converted back into half precision.

This is despite having the division with both numerator and denominator in half precision.

I know that CUDA uses a function call for this: hrcp so I also tried the following OpenCL reciprocal functions half_recip() / native_recip() with the same results.

Is there a way to force OpenCL to compute the reciprocal without first converting?

0

There are 0 best solutions below