How I can retrieve the sp on an risc-v32 without having UB?

124 Views Asked by At

My target is the riscv32-unknown-elf (piccolo32), so it has a stack. The compiler can and will therefore use it.

Code with volatile looks like this:

void* volatile sd = nullptr; // volatile to prevent optimization into register
void* volatile sp = &sd + REGISTER_SIZE; // volatile because sd is **MEHHH***

I don't like to use volatile here, since sp can and should be optimized into a register if possible.

So my question is, how I can retrieve the sp on an risc-v32 without having UB?

I thought that just using

void* sd = nullptr; //may be UB
void* sp = &sd + REGISTER_SIZE;

is UB, since a pointer in c++ is not a pointer to a real hardware location. And that the pointer itself could be optimized away/to any other location the compiler currently likes.

1

There are 1 best solutions below

0
Fabian Keßler On

The solution to this problem would be to use __builtin_stack_address(), like @harold suggested. But clang does not support this intrinsic. Also, some targets of gcc may not support it.

The solution for the RISC-V piccolo32 with the target option riscv32-unknown-elf is to write an own version of the intrinsic with inline assembler:

#if not __has_builtin(__builtin_stack_address)
// Todo(if compiler complains about "__"): wrap true buildin into another function
static void* __builtin_stack_address() {
    void* res;
    asm("mv %0,sp":"=r" (res):);
    return res;
}

#endif

Pure C++ or C solutions inhibit undefined behavior, since they all return an address to local storage. Or they have other issues:

The following for example

constexpr auto RISC_V_STACK_ALIGNMENT = 16;
constexpr void* builtin_stack_address(void* const& local = nullptr) noexcept {
    auto sp = std::bit_cast<uintptr_t>(&local);  // Relies on UB check binary output! clang and gnu may differ
    sp &= -RISC_V_STACK_ALIGNMENT;
    return std::bit_cast<void*>(sp);
}

does not return an address to local storage, but 2 successive calls to builtin_stack_address may return different values, depending on the allocated stack size and the stores made between those calls:

std::pair<void*,void*> test() noexcept {
    auto a = builtin_stack_address();
    volatile int i[8];
    i[0] = 1;
    i[1] = 1;
    i[2] = 1;
    i[3] = 1;
    i[4] = 1;
    fun();
    auto b = builtin_stack_address();
    return {a,b};
}

Creates the following asm:

test():                               # @test()
        addi    sp, sp, -32
        sw      ra, 28(sp)                      # 4-byte Folded Spill
        sw      s0, 24(sp)                      # 4-byte Folded Spill
        addi    s0, sp, 20                      #s0 = sp + 20
        andi    s0, s0, -16                     #s0 = sp + 16
        li      a0, 1
        sw      a0, 20(sp)
        sw      a0, 16(sp)
        sw      a0, 12(sp)
        sw      a0, 8(sp)
        sw      a0, 4(sp)
        call    fun
        mv      a1, sp                          # a1 = sp + 0
        andi    a1, a1, -16                     # a1 = sp + 0
        mv      a0, s0
        lw      ra, 28(sp)                      # 4-byte Folded Reload
        lw      s0, 24(sp)                      # 4-byte Folded Reload
        addi    sp, sp, 32
        ret

Where the stack pointer is evaluated as sp + 0 and sp + 16. GCC will also evaluate complete different things due to reordering. Therefore, it is never guaranteed to get the actual sp.