What is the difference between, say, _mm512_mask_mov_epi64 and _mm512_mask_blend_epi64. Besides the order and name of the arguments I cannot see any difference. Pseudo-code in Intels intrinsics guide looks completely equivalent as well:
Blend:
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := b[i+63:i]
ELSE
dst[i+63:i] := a[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
Mov:
FOR j := 0 to 7
i := j*64
IF k[j]
dst[i+63:i] := a[i+63:i]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
dst[MAX:512] := 0
I wrote two functions plus a
main()function:I compiled with:
I checked the output of the two functions with
and it is exactly the same:
(I used g++ hence the extra decoration to the function names).
So you are right, they are exactly the same thing (except for the position of the mask in the list of parameters).
Since Peter Cordes mentioned
icc(which is reallyicpxnow), I though I could give that a try. It's not even usingVPBLENDMQor someMOVinstruction. It uses theVSHUFI64X2instead. Better optimized for sure. Two instead of four instructions (not counting theENDBR64andRETinstructions).Compiled with:
Output:
That being said, again, the two functions are exactly the same, bit for bit.