std::bit_cast equivalent for CUDA device side code?

58 Views Asked by At

I have a couple of "magic"¹ floating point constants, which I want to use bit-exact in CUDA device side computation, in the form of constexpr symbols. On the host side you'd use std::bit_cast<float>(0x........) for that. However NVCC doesn't "like" std::bit_cast in device side code.

In GLSL you'd use intBitsToFloat, however I see no built-in function in the CUDA C++ language extensions that can do this.


1: well, they're not that "magic", basically they're the floating point equivalent of 0.999…·2ⁿ, that is all bits of the mantissa set to 1 with -(n+1) added to exponent "0" (i.e. 0x7E-n-1).

2

There are 2 best solutions below

0
paleonix On

Given a host compiler that supports it, you can use std::bit_cast in CUDA C++20 device code (i.e. CUDA >=12) to initialize a constexpr variable. You just need to tell nvcc to make it possible by passing --expt-relaxed-constexpr.

This flag is labeled as an "Experimental flag", but to me it sounds more like "this flag might be removed/renamed in a future release" than a "here be dragons" in terms of its results. It is also already quite old, which gives me some confidence. See the CUDA 8.0 nvcc docs from 2016 (docs for even older versions are not available online as html, so I didn't check further back).

As constexpr code is evaluated by the compiler on the host independent of the surrounding device context, I would not expect this flag to be some brittle "black magic". It just needs to pass off the evaluation to the host compiler and use the resulting value/object.

Given all this context I would rather expect the --expt-relaxed-constexpr-behavior to become the default in some future CUDA version than it vanishing without a replacement.

For anyone who needs a non-constexpr version of bit_cast, see Safe equivalent of std::bit_cast in C++11 (just add __device__).

5
CygnusX1 On

In CUDA __device__ code there is an intrinsic function __int_as_float, as well as __float_as_int which does what you need. The only limitation is that it is not constexpr.

As a walkaround, I believe you could keep the constexpr value as an integer and call __int_as_float only at the spot where you transition into the run-time that executes on GPU.