Which instructions are included in the hardware event `INST_SIMD_ALU`?

21 Views Asked by At

Short question: which instructions other than floating point arithmetic instructions like fmul, fadd, fdiv etc are counted under the hardware event INST_SIMD_ALU in XCode Instruments? Alternatively, how can I count the number of floating point operations in a program using CPU counters?

I want to measure/estimate the FLOPs count of my program and thought that CPU counters might be a good tool for this. The closest hardware event mnemonic that I could find is INST_SIMD_ALU, whose description reads.

Retired non-load/store Advanced SIMD and FP unit instructions

So, as a sanity check I wrote a tiny Swift code with ostensibly predictable FLOPs count.

let iterCount = 1_000_000_000
var x = 3.1415926
let a = 2.3e1
let ainv = 1 / a  // avoid inf
for _ in 1...iterCount {
    x *= a
    x += 1.0
    x -= 6.1
    x *= ainv
}

So, I expect there to be around 4 * iterCount = 4e9 FLOPs. But, on running this under CPU Counters with the event INST_SIMD_ALU I get a count of 5e9, 1 extra FLOP per loop iteration than expected. See screenshot below. dumbLoop is the name of the function that I wrapped the code in.

INST_SIMD_ALU count for synthetic loop.

Here is the assembly for the loop

+0x3c   fmul                d0, d0, d1   <----------------------------------
+0x40   fadd                d0, d0, d2                                      |
+0x44   fmov                d4, x10                                         | 
+0x48   fadd                d0, d0, d4                                      |
+0x4c   fmul                d0, d0, d3                                      |
+0x50   subs                x9, x9, #0x1                                    |
+0x54   b.ne                "specialized dumbLoop(_:initialValue:)+0x3c" ---

Since it's non-load/store instructions, it shouldn't be counting fmov and b.ne. That leaves subs, which is an integer subtraction instruction used for decrementing the loop counter. So, I ran two more "tests" to see if the one extra count comes from subs.

On running it again with CPU Counters with the hardware event INST_INT_ALU, I found a count of one billion, which adds up with the number of loop decrements.

INST_INT_ALU

Just to be sure, I unrolled the loop by a factor of 4, so that the number of loop decrements becomes 250 million from one billion.

let iterCount = 1_000_000_000
var x = 3.1415926
let a = 2.3e1
let ainv = 1 / a  // avoid inf
let n = Int(iter_count / 4)
for _ in 1...n {
    x *= a
    x += 1.0
    x -= 6.1
    x *= ainv
    x *= a
    x += 1.0
    x -= 6.1
    x *= ainv
    x *= a
    x += 1.0
    x -= 6.1
    x *= ainv
    x *= a
    x += 1.0
    x -= 6.1
    x *= ainv
}
print(x)

And it adds up, around 250 million integer ALU instructions, and the total ALU instructions is 4.23 billion, somewhat short of the expected 4.25 billion.

Total ALU instructions count for unrolled variant

So, at the moment if I want to count the FLOPs in my program, one estimate I can use is INST_SIMD_ALU - INST_INT_ALU. But, is this description complete, or are there an other instructions that I might spuriously count as floating point operations? Is there a better way to count the number of FLOPs?

0

There are 0 best solutions below