This code...
bool condSet(int cond, int a, int b) {
return cond ? a : b;
}
..Generates for gcc 6.3...
test edx, edx
setne al
test edi, edi
jne .L6
rep ret
.L6:
test esi, esi
setne al
ret
.. For icc 17...
test edi, edi
cmovne edx, esi
mov eax, 1
test edx, edx
cmove eax, edx
ret
..And for clang 3.9
test edi, edi
cmove esi, edx
test esi, esi
setne al
ret
Why do we have theses differences, for a code pattern, that I'd expect to be common? They all rely on conditional instruction, setne, cmovne, cmove, but gcc has a branch as well, and they all use different order of instructions and parameters.
What pass in the compiler is responsible for this code generation? Is the difference due to how the register allocation is done; how the general dataflow analysis is done; or do the compiler pattern match against this pattern when generating the code?
The code and the asm listings: https://godbolt.org/g/7heVGz
Changing the return type to
int
results in branchless code from all three compilers, using thetest/cmov
strategy.I'd guess that gcc decides that booleanizing both sides of the conditional would be too much work, and decides to use a branch. Maybe it doesn't realize that it's the same work, and the expression can actually be done the other way (select the right input and then booleanize that).
The code it makes does booleanize
b
, and only then tests the condition and booleanizesa
. So whencond
is true, it actually runs bothtest
/setnz
pairs.This smells like a missed-optimization bug. (Or an optimization-run-amok bug, where it shoots itself in the foot by applying the return-type to both inputs of the
?:
instead of only to the result).Reported as GCC Bug 78947.
Until that's fixed, you can get gcc to make code like clang / icc by splitting it into two steps: