I'm trying to create a code which divides a uint64_t by another uint64_t plus it applies rounding to the result. The code should be as fast as possible and work for all inputs (e.g. I would prefer it to now have and conditionals).
My current solution looks like this:
static inline uint64_t divide_with_rounding(uint64_t n, uint64_t d)
{
uint64_t a = n / d;
uint64_t r = n % d;
return a + (r >= d - (d / 2));
}
gcc optimizes the division+modulo quite nicely and as well the / 2. But I wonder if there's a shorter and nicer solution.
E.g. something like this:
static inline uint64_t divide_with_rounding(uint64_t n, uint64_t d)
{
return (n + d / 2) / d;
}
However that one has the disadvantage that divide_with_rounding(UINT64_MAX, 1000) produces 0.
The expression is round(x/d) = ⌊(x + d/2)/d⌋ mathematically. From the property of floor function ⌊x + n⌋ = ⌊x⌋ + n we can prove that in case d is even the result is
In case d is odd we can replace d = 2k + 1 and prove that the result is the same. Therefore you can just use
This will avoid the situation where
n + d/2overflowsHowever in case
dis not a compile-time constant then it might be faster to do a 128-by-64-bit division if the branch misprediction cost is high. In MSVC you can do like thisand in compilers with
__int128type like GCC, ICC, Clang... just use it directly