Im trying to index of k. set bit in a 256 bit integer. Here is my attempt.
uint16_t find_index(const uint64_t A[4], uint8_t k) {
uint16_t res = 0;
for (int i = 0; i < (k / 64) - 1; i++) {
res += __builtin_popcountll(A[i]);
}
for (int i = 0; i < k % 64; i++) {
res += (A[k / 64] >> ((uint64_t)i)) & 0x01;
}
return res;
}
For starters, I don't like the second loop. Im pretty sure there is a way to do it it with clz, bfs etc. but I couldn't figure out.
Secondly I'd like to make use of sse, avx instructions and autovectorization. I don't particularly care about portability and ok with using gcc/clang/linux specific extensions.