I am vectorizing a piece of code and at some point I have the following setup:
register m128 a = { 99,99,99,99,99,99,99,99 }
register m128 b = { 100,50,119,30,99,40,50,20 }
I am currently packing shorts in these registers, which is why I have 8 values per register. What I would like to do is subtract the i'th element in b with the corresponding value in a if the i'th value of b is greater than or equal to the value in a (In this case, a is filled with the constant 99 ). To this end, I first use a greater than or equal to operation between b and a, which yields, for this example:
register m128 c = { 1,0,1,0,1,0,0,0 }
To complete the operation, I'd like to use the multiply-and-subtract, i.e. to store in b the operation b -= a*c. The result would then be:
b = { 1,50,20,30,0,40,50,20 }
Is there any operation that does such thing? What I found were fused operations for Haswell, but I am currently working on Sandy-Bridge. Also, if someone has a better idea to do this, please let me know (e.g. I could do a logical subtract: if 1 in c then I subtract, nothing otherwise.
You can copy
btoc, subtractafromc, perform an arithmetic shift right by 15 positions in the 16 bit values, complement the value ofc, maskcwitha, and finally subtractcfromb.I'm not familiar for the intrinsics syntax, but the steps are:
here is an alternative with fewer steps: