In addition, I read these instructions:
ptrue p0.s
ptrue p0.d
ptrue p0.b vl64
ptrue p0.b vl32
So, what are their effects and differences?
In addition, I read these instructions:
ptrue p0.s
ptrue p0.d
ptrue p0.b vl64
ptrue p0.b vl32
So, what are their effects and differences?
Copyright © 2021 Jogjafile Inc.
I'm new to SVE so my answer may be wrong:
Some background
(Probably you already know that...)
The width of SVE registers differs from one CPU to another so you might run into the following problem:
You write your program for a CPU that allows 3 numbers per register and load the values
{10, 20, 30}to one register and{5, 10, 3}to another register and perform element-wise division. You expect{10/5, 20/10, 30/3} = {2, 2, 10}as result.However, you are running your program on another CPU that allows 5 elements per register, so the second register contains
{0, 0, 5, 10, 3}, so you would get a division by zero (because of the first two elements).To avoid this situation, SVE uses special "predicate registers" (
P0-P15) that contain a bit mask that tells the CPU which element in the register is valid and which one is invalid. In the example above, the bitmask shall be{invalid, invalid, valid, valid, valid}.Your actual question
This instruction sets the value of the register
P0in a way that a later 32-bit (.s) operation will process all fields in the SVE register."32-bit operation" means: An operation that interprets a 320-bit SVE register as 10 32-bit values.
This instruction sets the value of the register
P0in a way that a later 64-bit (.d) operation will process all fields in the SVE register.These instructions will set the value of the register
P0in a way that a later 8-bit (.b) operation will process the low 64 (vl64) or 32 (vl32) bytes of the SVE register.On a CPU where the SVE registers are less than 512 (
vl64) or 256 (vl32) bits wide, the corresponding instruction sets the value ofP0to "all elements are invalid" to ensure that nothing stupid happens.