I am experimenting with the way parameters are passed to a function when compiling C++ code. I tried to compile the following C++ code using the x64 msvc 19.35/latest compiler to see the resulting assembly:
#include <cstdint>
void f(std::uint32_t, std::uint32_t, std::uint32_t, std::uint32_t);
void test()
{
f(1, 2, 3, 4);
}
and got this result:
void test(void) PROC
mov edx, 2
lea r9d, QWORD PTR [rdx+2]
lea r8d, QWORD PTR [rdx+1]
lea ecx, QWORD PTR [rdx-1]
jmp void f(unsigned int,unsigned int,unsigned int,unsigned int)
void test(void) ENDP
What I do not understand is why did the compiler chose to use lea instead of a simple mov for this example. I understand the mechanics of lea and how it results in the correct values in each register, but I would have expected something more straightforward like:
void test(void) PROC
mov ecx, 1
mov edx, 2
mov r8d, 3
mov r9d, 4
jmp void f(unsigned int,unsigned int,unsigned int,unsigned int)
void test(void) ENDP
Moreover, from my little understanding of how modern CPUs work, I have the feeling that the version using lea would be slower since it adds a dependency between the lea instructions and the mov instruction.
clang and gcc both gives the result I expect, i.e., 4x mov.
MSVC's code is smaller than the naive
movapproach. (But as you point out, because of the dependency, it may potentially be slower; you would have to test that.)mov ecx, 1is 5 bytes: one byte for the opcode B8-BF which also encodes the register, and 4 bytes for the 32-bit immediate. In particular, unlike for some arithmetic instructions, there is no option formovto encode a smaller immediate with fewer bytes using zero- or sign-extension.lea ecx, [rdx-1]is 3 bytes. One byte for the opcode; one MOD R/M byte which encodes the destination registerecxand the base registerrdxfor the effective address of the memory operand; and (here is the key) one byte for an 8-bit sign-extended displacement.The instructions using
r8,r9need one extra byte for a REX prefix; but that's true for bothmovandleaso it's a wash.