Reciprocal of fp16 in OpenCL

61 Views Asked by Bram At 03 May 2023 at 23:30

In my OpenCL kernel I use 16bit floating point values of type half from the cl_khr_fp16 extension.

Although this gives me code that works well, I noticed with AMD's radeon developer tools that the reciprocal is computed in 32 bits (gpu target is gfx1102 RDNA3.)

So the value is first converted from half precision to single precision, then the reciprocal is computed, and then the result is converted back into half precision.

This is despite having the division with both numerator and denominator in half precision.

I know that CUDA uses a function call for this: hrcp so I also tried the following OpenCL reciprocal functions half_recip() / native_recip() with the same results.

Is there a way to force OpenCL to compute the reciprocal without first converting?

Original Q&A

Reciprocal of fp16 in OpenCL

There are 0 best solutions below

Related Questions in OPENCL

Related Questions in GPGPU

Related Questions in HALF-PRECISION-FLOAT

Trending Questions

Popular # Hahtags

Popular Questions