Similarly, value rounded down to power of two and finding the rightmost set bit is equivalent to log base 2 rounded down to integer. Is that correct?
I thought that that would work, for any number round up to a power of two and do what in C++ is countr_zero() or in MSVC is a bitscan function or on GCC is __builtin_ctz(). Would this work? And this must be way better than the log function which is (from my understanding) probably the slowest calculation a CPU can do.
Interestingly, doing it this way involves quite a few steps itself:
//Round up to power of 2
unsigned int v; // compute the next highest power of 2 of 32-bit v
v--;
v |= v >> 1;
v |= v >> 2;
v |= v >> 4;
v |= v >> 8;
v |= v >> 16;
v++;
While countr_zero looks like:
usize countr_zero(u64 x) {
if (x == 0)
return 64;
#ifdef __GNUC__
return __builtin_ctzll(x);
#else
constexpr std::array<usize, 64> table = {
0, 1, 2, 7, 3, 13, 8, 27, 4, 33, 14, 36, 9, 49, 28, 19,
5, 25, 34, 17, 15, 53, 37, 55, 10, 46, 50, 39, 29, 42, 20, 57,
63, 6, 12, 26, 32, 35, 48, 18, 24, 16, 52, 54, 45, 38, 41, 56,
62, 11, 31, 47, 23, 51, 44, 40, 61, 30, 22, 43, 60, 21, 59, 58};
return table[(x & ~x + 1) * 0x218A7A392DD9ABF >> 58 & 0x3F];
#endif
}
But it still has to be better than calling log and converting to integer, right?
Yes,
These equivalences are only true if we ignore the edge case of
log2(0.1), wherefloor(log2(0.1))is not the same aslog2(0).Note that you can obtain the floored binary logarithm of any integer
xusingstd::bit_width(x) - 1. The ceiled logarithm is also obtainable this way; you just need to increment ifxis not a power of two.It's not obvious that it would be slower. Floating point operations may have hardware support, and can be faster in principle. Especially
sqrtis surprisingly fast. I am not aware of any architecture with a fast built-in logarithm instruction. The x87 instruction is relatively slow.When in doubt, you should profile.
See Also