I was playing with this code to check if an integer is a power of 4:
// C++ version
bool is_pow_4(unsigned a) {
return (std::popcount(a) == 1) && (std::countr_zero(a) % 2 == 0);
}
// C version
int is_pow_4(unsigned a) {
return (__builtin_popcount(a) == 1) && (__builtin_ctz(a) % 2 == 0);
}
Basically I check that there is just one bit set and it is on an odd position.
I was expecting a branchless code, however I see two jumps and a rep instruction. From what I recall a rep is basically a loop, but on assembly level. It seems that std::countr_zero/__builtin_ctz generates the rep instruction.
is_pow_4(unsigned int):
lea edx, [rdi-1]
mov ecx, edi
xor eax, eax
xor ecx, edx
cmp edx, ecx
jb .L7
.L1:
ret
.L7:
mov eax, 1
test edi, edi
je .L1
xor eax, eax
rep bsf eax, edi
not eax
and eax, 1
ret
C output is similar.
I understand that the loop is bound by the width of the integer (32), so I think the complexity of the code is O(1), but I was still surprised to find a loop.
Is my understanding correct? Is this code a loop on x86? Is this because while there is a x86 popcount instruction, there is no count leading/trailing zeros x86 instruction?