_mm256_mul_ps is the Intel intrinsic for "Multiply packed single-precision (32-bit) floating-point elements". _mm256_mul_ph is the intrinsic for "Multiply packed half-precision (16-bit) floating-point elements ".
I can call _mm256_mul_ps from using using use std::arch::x86_64::*;, e.g.
#[inline]
fn mul(v: __m256, w: __m256) -> __m256 {
unsafe { _mm256_mul_ps(v, w) }
}
However, it appears to be difficult to call _mm256_mul_ph. Can one call _mm256_mul_ph in Rust?
What you are looking for requires the AVX-512 FP16 and AVX-512 VL instruction sets - the former of which does not appear to have any support within Rust at the moment.
You can potentially create your own intrinsics as-needed by using the
asm!macro. The assembly for_mm256_mul_phlooks like this:So the equivalent in Rust would be written like so:
To make your own intrinsics for other instructions, please ensure you follow the guidelines for Rust's inline assembly as well as careful understanding of what the instructions do. Inline assembly is
unsafeand can cause very weird behavior if specified improperly.Caveats:
This only works since the underlying machinery (LLVM 17 as of Rust 1.76) does support this instruction set. If you attempt to try this method on a brand new instruction set (or with an older toolchain) it may not work and fail to compile due to an "invalid instruction".
__m256iis used in lieu of a__m256htype since the latter does not exist. Currently__m256iis used as a "bag of bits" (documentation's words) so you must keep track yourself that it is holdingf16x16values.The
target_feature = "avx2"condition is woefully inadequate for properly limiting it to targets that can actually run this function. There is noavx512fp16target feature flag available within Rust (_mm256_mul_phin particular needs theavx512vltarget feature flag as well but that also is not supported).You will either need to be careful that you only compile and run this code for architectures that support it - currently Intel Sapphire Rapids CPUs as far as I can tell. Or it might be better to introduce your own compilation flag which, while not perfect, would hopefully limit the scope of things going wrong.
If compiled and executed on an unsupported architecture, you'll receive an "illegal instruction" error (in the best case).