as in the title - I want to do as below:
__m128i_u* avxVar = (__m128i_u*)Var; // Var allocated with alloc
*avxVar = _mm_set_epi64(...); // is that ok to assign __m128i to __m128i_u ?
as in the title - I want to do as below:
__m128i_u* avxVar = (__m128i_u*)Var; // Var allocated with alloc
*avxVar = _mm_set_epi64(...); // is that ok to assign __m128i to __m128i_u ?
Copyright © 2021 Jogjafile Inc.
Yes, but note that
__m128i_uis not portable (e.g. to MSVC); it's what GCC/clang use internally to implement unaligned loadu/storeu intrinsics. It's exactly equivalent to do it the normal way:(where
vecis any__m128i. e.g. it could be_mm_set_epi64xor a variable.)GCC 11's
emmintrin.himplementation of_mm_storeu_si128is defined like this, taking a__m128i_u*pointer arg, so the dereference does an unaligned access (if not optimized away).So yes, GCC's headers depend on
__m128i*and__m128i_u*being compatible and implicitly convertible.As much as
_mm_storeu_si128is an intrinsic formovdqu, so is a__m128i_u*dereference. But really these intrinsics just exist to communicate alignment information to the compiler, and it's up to the compiler to decide when to actually load and store, just like with deref ofchar*.(Fun fact:
__m128i*is amay_aliastype, likechar*, so you can point it at anything without violating strict-aliasing. Is `reinterpret_cast`ing between hardware SIMD vector pointer and the corresponding type an undefined behavior?)Also note that
_mm_set_epi64takes__m64args: it was for building an SSE2 vector from two MMX vectors, not from scalarint64_t. You probably want_mm_set_epi64xThey compile identically
Both functions compile identically (and are semantically equivalent so will always be the same after inlining) across gcc/clang/MSVC. But only the 2nd one compiles at all with MSVC, as you can see on the Godbolt compiler explorer: https://godbolt.org/z/Y8Wq96Pqs . if you disable the
#ifdef __GNUC__, you get compiler errors on MSVC.With more complex surrounding code,
_mm_loadu_si128can fold into a memory source operand for ALU only with AVX (e.g.vpaddb xmm0, xmm1, [rdi], but_mm_load_si128aligned loads can fold into SSE memory sources likepaddb xmm0, [rdi].