Intel CET (control-flow enforcement technology) consists of two pieces: SS (shadow stack) and IBT (indirect branch tracking). If you need to indirectly branch to somewhere that you can't put an endbr64 for some reason, you can suppress IBT for a single jmp or call instruction with notrack. Is there an equivalent way to suppress SS for a single ret instruction?
For context, I'm thinking about how this will interact with retpolines, which the key control flow of goes more-or-less like push real_target; call retpoline; pop junk; ret. If there's not a way to suppress SS for that ret, then is there some other way for retpolines to work when CET is enabled? If not, what options will we have? Will we need to maintain two sets of binary packages for everything, one for old CPUs that need retpolines, and one for new CPUs that support CET? And what about if Intel turns out to be wrong, and we do end up still needing retpolines on their new CPUs? Will we have to abandon CET to use them?
After playing with the assembly for a bit, I discovered that you can use retpolines with CET, but it's less than ideal. Here's how. For reference, consider this C code:
Compiling it with
gcc -mindirect-branch=thunk -mfunction-return=thunk -O3yields this:It turns out you can make this work just by modifying the thunks to look like this:
By using the
incsspq,rdsspq, andwrssqinstructions, you can modify the shadow stack to match your changes to the real stack. I tested those modified thunks with Intel SDE, and they indeed made the control flow errors go away.That was the good news. Here's the bad news:
endbr64, the CET instructions I used in the thunks aren't NOPs on CPUs that don't support CET (they result inSIGILL). This means you'd need two different sets of thunks, and you'd need to use CPU dispatch to pick the right ones depending on whether CET is available.__x86_indirect_thunk_raxcheck for the presence of theendbr64instruction, but that's really inelegant and would probably be really slow.