How to efficiently interleave bits from 8 __int16 numbers?

158 Views Asked by At

I am building Morton number for spatial indexing, I have 8 unsigned 16 bit numbers that will turn into __int128 number. The efficiency is crucial, so naive solution (loop over everything) or building separate 8 128bit numbers is too expensive.

I am using GCC, the target machine is 64 bits but without BMI2 support.

How can I speed up the computation?

1

There are 1 best solutions below

2
Mischa On BEST ANSWER

If your machine is x86 and supports SSE2, there is a clever answer using movmsk instructions. Google SSE2 bit matrix transpose for full code.