I'm having slight issues with trying to use register-variables in clang-cl, on windows. I'm trying to declare a variable that simply reuses a (non-volatile) register that has been setup by a prior call. Here is a live demo of what I'm trying to achieve, and how it would work on GCC, but not on clang.
The problem
What I'm trying to do is basically the "global register variable" case. The function I'm trying to write is called from within my owns script JIT-runtime, and "RBX" contains a state-object that I need for that specific call. GCC would allow me to declare a global register variable to achieve this:
// Case 1 - using global register variable (only supported by GCC)
register ExecutionStateJIT* pState asm("rbx");
void resumeMultiplexGlobal(void)
{
pState->pData = nullptr;
}
This is exactly what I need (all the side-effects of using such a global variable in that particular translation unit are precisely what I need). But since clang doesn't support it - what are my alternatives? I tried a few different things:
// Case 2 - using unassigned local register variable (Clang refuses code-gen for this)
void resumeMultiplex(void)
{
register ExecutionStateJIT* pState asm("rbx");
pState->pData = nullptr;
}
Using a local-register variable without an assignment works in GCC as well, even though if I understand the docs it's not actually supported. Clang on the other hand does not compile anything for this function - probably due to assuming an unassigned variable, and performing wonderful UB-"f*** your code" optimizations.
Workarounds
There are two workarounds that I found, both not quite optimal:
// Workaround 1 - binding as output to asm (generates unwanted preverse of rbx)
void resumeMultiplexWorkaround()
{
register ExecutionStateJIT* pState asm("rbx");
asm volatile(""
: "=r" (pState)
:
: );
pState->pData = nullptr;
}
Simply binding the unassigned variable as an output produces the same code on GCC. It also get's clang to finally emit some code, though it contains an absolutely unnecessary preservation of the RBX register - even though I purposefully tried to lie to the compiler about RBX being clobbered (but I assume output to a register-variable implicitely clobbers the register).
I could of course mov to another register:
// Workaround 2 - mov to another register
// (unnecessary mov; plus if we want to be call-clobber save we'd need another non-volatile register that needs to be preserved)
void resumeMultiplexWorkaround2()
{
register ExecutionStateJIT* pState asm("rcx");
asm volatile("mov %0,rbx"
: "=r" (pState)
:
: );
pState->pData = nullptr;
}
Which is technically less instructions, but for the actual function in question, I need a call-preserved register, as it may contain function-calls.
Wrapping it up
So, is there any way to get an optimal code-gen for this example on clang? The actual function in question is quite large:
void resumeMultiplex(void)
{
register ExecutionStateJIT* pState asm("rbx");
asm volatile(""
: "=&r" (pState)
:
: );
auto& state = *pState;
AE_ASSERT(state.pMultiplex);
auto& multiplex = *state.pMultiplex;
multiplex.SetDt(state.dt);
AE_ASSERT(multiplex.isFullyStarted);
AE_ASSERT(!multiplex.vSubFrames.IsEmpty());
AE_ASSERT(!sys::isValidId(multiplex.currentFrame));
multiplex.currentFrame = 0;
auto& frame = *multiplex.vSubFrames.Data();
state.pFrame = frame.pFrame;
state.pMultiplex = frame.pMultiplex;
state.pThis = frame.pThis;
copyMultiplexDataImpl(frame.pFrame + 32, frame.vData.Data(), frame.vData.Size());
//! purposefully lying to compiler here about clobbered registers, as modifications to stack are removed with the jmp
asm volatile("mov rsp,%0"
:
: "o" (frame.pFrame)
:);
asm volatile("mov rbp,%0"
:
: "o" (frame.pTop)
: );
//! similarily, we jmp in __asm-block here, as otherwise we would get unwanted epilogue in case of non-inlined functions
__asm
{
jmp frame.pAddress;
}
}
So I don't want to have to code all the accesses to "state" myself in assembly.