Nested use of RET instruction in ARMv8

171 Views Asked by At

Let label1, label2 be two set of instructions both ending with a RET instruction, and such that label2 branches to label1 with a link. In other words, we have a code that looks like this (I will enumerate some positions for clarity in what follows):


label1:
    # Some operations...
    # (1)
    RET

label2:
    #Some operations
    BL label1
    #(2)
    RET

I want to call label2 in main :

main:
    # Some operations...
    BL label2
    #(3)

The behavior I wish is that, after branching to label2 in main, the execution continues from #(3). However, this is not the case. When in main I call BL label2 the link register holds the address of the desired return point #(3), and label2 is executed. However, inside label2, I call BL label1, replacing the link register with #(2)#. This makes it so that the RET instruction after #(1) takes the execution to #(2), and the RET instruction after #(2) points to #(2) once more. You can see the problem.

High-level programming languages allow for nested use of functions. I can define a function f inside of which I call a function g, each with return statements. So the functionality I desire must be possible somehow. How can I make sequential or nested calls of BL some_label and RET that return to the first place where BL was called?

I'm very new to assembly so forgive me if the problem is somewhat trivial.

1

There are 1 best solutions below

2
old_timer On

You basically tagged multiple architectures. So I will chose an architecture.

Short answer is you save the return address to the stack.

unsigned int more_fun ( unsigned int x );

unsigned int fun0 ( unsigned int x )
{
    return(x+1);
}

unsigned int fun1 ( unsigned int x )
{
    return(more_fun(x));
}

unsigned int fun2 ( unsigned int x )
{
    return(more_fun(x)+1);
}

Disassembly of section .text:

00000000 <fun0>:
   0:   e2800001    add r0, r0, #1
   4:   e12fff1e    bx  lr

00000008 <fun1>:
   8:   e92d4010    push    {r4, lr}
   c:   ebfffffe    bl  0 <more_fun>
  10:   e8bd4010    pop {r4, lr}
  14:   e12fff1e    bx  lr

00000018 <fun2>:
  18:   e92d4010    push    {r4, lr}
  1c:   ebfffffe    bl  0 <more_fun>
  20:   e8bd4010    pop {r4, lr}
  24:   e2800001    add r0, r0, #1
  28:   e12fff1e    bx  lr

If there is no nested call then you do not need to save the return address (shown as lr or link register in this assembly language). The extra register r4 is not because r4 is special here but because the calling convention the compiler is using dictates a 64 bit aligned stack so they toss in some other register, that won't affect the code/convention to make it aligned.

The second one calls a function so one of two things, in this case it could have done a tail? optimization and done this

fun1:
b more_fun

but it did not. Perhaps only obvious to a few but I am using an older gcc (not cutting edge) and left the default which is armv4t obviously. so perhaps for that reason the toolchain is not willing to deal with arm/thumb mode switching, but with bl they will put a veneer/trampoline in for you in the linker.

The third one I forced it to not be able to do a tail optimization and this resembles the kind of thing you typically see and that most people will write if done by hand.

Actually this is what you expect the compiler to produce:

fun2:
push {r4, lr}
bl  more_fun
add r0, r0, #1
pop {r4, lr}
bx  lr

with this stack frame at the beginning and end of the function

fun2:
push {r4, lr}
...
pop {r4, lr}
bx  lr

then fill in the guts of the function.

Same gcc but different arm archtecture, over time more thumb interwork support was added.

Disassembly of section .text:

00000000 <fun0>:
   0:   e2800001    add r0, r0, #1
   4:   e12fff1e    bx  lr

00000008 <fun1>:
   8:   eafffffe    b   0 <more_fun>

0000000c <fun2>:
   c:   e92d4010    push    {r4, lr}
  10:   ebfffffe    bl  0 <more_fun>
  14:   e2800001    add r0, r0, #1
  18:   e8bd8010    pop {r4, pc}

If writing by hand one could do it without stack frames, just before the nested branch preserve what you need to preserver, then just after restore what you need to restore, so instead of a stack frame on the edges of the function you save and restore as you go through code. nothing wrong with that, a very hand assembly way of doing it vs a high level language compiled way of doing it.

Other architectures, not arm for example, have different architectures; the call and return may for example use the stack and not a return register. So in that case as far as the return address goes you just keep making calls (8088/86 for example). You will eventually need to preserve other things to make nested calls to not trash things other than the return address. This is where compiled languages make/choose a calling convention that as a set of rules allows the construction of a function in a generic way so that you can nest or recurse indefinitely.