https://godbolt.org/z/dK9v7En5v
For following C++ code
#include <stdint.h>
#include <cstdlib>
void Send(uint32_t);
void SendBuffer(uint32_t* __restrict__ buff, size_t n)
{
for (size_t i = 0; i < n; ++i)
{
Send(buff[0]);
Send(buff[1]);
for (size_t j = 0; j < i; ++j) {
Send(buff[j]);
}
}
}
we have following assembler listing
SendBuffer(unsigned int*, unsigned long):
test rsi, rsi
je .L15
push r13
mov r13, rsi
push r12
mov r12, rdi
push rbp
xor ebp, ebp
push rbx
sub rsp, 8
.L5:
mov edi, DWORD PTR [r12]
call Send(unsigned int)
mov edi, DWORD PTR [r12+4]
call Send(unsigned int)
test rbp, rbp
je .L3
xor ebx, ebx
.L4:
mov edi, DWORD PTR [r12+rbx*4]
add rbx, 1
call Send(unsigned int)
cmp rbx, rbp
jne .L4
.L3:
add rbp, 1
cmp r13, rbp
jne .L5
add rsp, 8
pop rbx
pop rbp
pop r12
pop r13
ret
.L15:
ret
On each loop iteration there is read from memory, while the value could be stored once on register.
It doesn't matter, do we have internal loop or not, compiler do not optimise that construction, I've add the loop to demonstrate that compiler can not rely on processor cache
Is that valid for compiler according to C++ standard to load memory from register once before loop (if we have or don't have __restrict__ keyword)?
Why compiler doesn't do that optimisation if it's valid?
How can I say to compiler that nobody will change that memory and it's valid if now it's not?
You could help the compiler by rearranging your code, so that you can see the impact of RAM optimizations.
In the above code, the bottleneck is the call to
Send. Accessing thebuffarray is much faster. Also, the branch evaluations in the loops take more time than accessing the array.The true optimization here, should be to modify the
Sendso that it transfers blocks and not words. Most device communications have a block transfer capability.Otherwise you can try unrolling the loop. (The compiler may perform loop unrolling a higher optimization levels)
Examining the assembly language should show a better organized and optimized code.
Edit 1: Included outer loop, corrected index variable usage.