How to set to 1 efficiently with AVX2
- first
Nbits - last
Nbits
of __m256i, setting the rest to 0?
These are 2 separate operations for tail and head of a bit range, when the range may start and end in the middle of __m256i value. The part of the range occupying full __m256i values is processed with all-0 or all-1 masks.
The AVX2 shift instructions
vpsllvdandvpsrlvdhave the nice property that shift counts greater than or equal to 32 lead to zero integers within the ymm register. In other words: the shift counts are not masked, in contrast to the shift counts for the x86 scalar shift instructions.Therefore the code is fairly simple:
The results are:
For a value
n, with 256<=n<=65535, all bits are set to one, as one might expect. The upper limit of 65535 is due to the 16-bit saturated arithmetic of_mm256_subs_epu16(). Withn=65536 the bitmask (the output value) is zero. It is possible to modify the code such that all bits are set to one for the range of 256<=n<=INT_MAX. This can be achieved by replacingshift = _mm256_subs_epu16(cnst32_256,shift);withThese three intrinsics more or less emulate
_mm256_subs_epu32(cnst32_256,shift), which doesn't exist.