IERAE CTF 2024 - Intel CET Bypass Challenge

IERAE CTF had one of the coolest pwn challenges I've done in the while. It was written by hugeh0ge.

Here's the full source:

// gcc chal.c -fno-stack-protector -static -o chal
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <signal.h>

void timedout(int) {
  puts("timedout");
  exit(0);
}

char g_buf[256];

int main() {
  char buf[16];
  long long int arg1 = 0;
  long long int arg2 = 0;
  void (*func)(long long int, long long int, long long int) = NULL;

  alarm(30);
  signal(SIGALRM, timedout);

  fgets(g_buf, 256, stdin); // My mercy
  fgets(buf, 256, stdin);
  if (func) func(arg1, arg2, 0);
}

The main feature here is a straight buffer overflow with a conveniently placed function pointer where we'll control the first two arguments. Usually, this would be a trivial challenge: just write a rop chain that will pop a shell and print the flag. But note that the binary is linked statically and thus we can't just point func to system.

The twist of the challenge is that this is running with full CET enabled. Return addresses are protected by a shadow stack and indirect calls are protected by IBT which only allows branches to go to new endbr64 instructions.

IBT support for userland hasn't landed in the Linux kernel yet, so the challenge author wrote a small patch to enable it:

diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c
index 19e4db582..d4387b68e 100644
--- a/arch/x86/kernel/shstk.c
+++ b/arch/x86/kernel/shstk.c
@@ -174,7 +174,7 @@ static int shstk_setup(void)
 
        fpregs_lock_and_load();
        wrmsrl(MSR_IA32_PL3_SSP, addr + size);
-       wrmsrl(MSR_IA32_U_CET, CET_SHSTK_EN);
+       wrmsrl(MSR_IA32_U_CET, CET_SHSTK_EN | CET_ENDBR_EN | CET_NO_TRACK_EN);
        fpregs_unlock();
 
        shstk->base = addr;

So how do we solve this?

Clearly, overwriting return addresses won't get us anywhere. We would need to corrupt the return addresses on the shadow stack and it's enforced in hardware that only call instructions can write to it.

That leaves us with a controlled function pointer with two controlled args (and the third set to 0).

We need to figure out what we can actually call, since we're only allowed to target endbr64 instructions. Luckily, there are a lot in the binary:

% objdump -d chal | grep endbr64 | wc -l
956

There are some nice options in there, for example:

% objdump -d chal | grep -B1 endbr64 | grep '<' | grep executable
0000000000459200 <_dl_make_stacks_executable>:

I didn't need this in the end since I found an easier solution, but turns out that the challenge author used this function in his solution (check out the writeup here). There's also open, read and write in the list of functions with endbr64, which is all we would need to read the flag. The main limiting factor seems to be that we can only do a single function call at this point.

One solution to this is to find a function that will call multiple function pointers that we control. I believe other players used the vtable of glibc's FILE structure for this.

The challenge author went a different route in the intended solution. The sigaction function has an endbr64 landing pad as well, so he installed a handler to catch SIGSEGV and redirect execution back to main to create a loop. You just need to set SA_NODEFER in the options so that the signal handler can get called recursively. He then used _dl_make_stacks_executable to get RWX memory and the challenge is pretty much over at that point.

An easier solution

On Saturday it was my turn to bring the kids to bed, so I didn't have time to do all this complicated setup with loops and stuff. :)

I also went for installing a signal handler. I don't need SA_NODEFER, so I was able to use signal instead. It also has an endbr64 and doesn't require the complicated sigaction struct setup. Just a simple call to signal(SIGSEGV, main).

Now the bug gets triggered again and I can overwrite the stack with controlled data. The difference to before is that we're inside a signal handler. That is, the kernel saved the state of all registers to the stack and our return address points to __restore_rt.

That means, we can now corrupt the saved register state with our overwrite! Just two things to avoid crashing:

There's still the function pointer func on the stack. Just set it to 0 so that it doesn't get executed.
The return address is still protected by the shadow stack. But we can simply overwrite it with the same value (__restore_rt) and the shadow stack will be happy.

Now, when the signal handler returns, the kernel will load rip and all registers from the stack. I simply pointed rip to a syscall instruction at this point and set the registers up to call execve(["/bin/cat", "/flag"]). You can find my hacky exploit here.

Why did CET fail?

This challenge seems to be the easiest possible setup, why were there multiple solutions that bypassed CET in this case?

The shadow stack definitely worked as intended, none of the players were able to corrupt return addresses.

IBT however didn't provide any difficulties. If IBT had been disabled, I would still have written pretty much the same exploit for this challenge. The good news is that these shortcomings can be fixed!

I see three main issues demonstrated in this challenge:

Missing IBT enforcement on signal handler entry
No signal frame protections
Unnecessary landing pads

A more fine-grained approach such as FineIBT could also have limited the exploitation options in this challenge.

1) Missing IBT enforcement on signal handler entry

The kernel in this case didn't enforce that the signal handler starts with an endbr64 instruction, which allowed jumping to arbitrary code.

Though note again, that the challenge was based on a custom kernel patch. Upstream support hasn't landed yet and I know that this issue has already been discussed.

Also, this wouldn't have helped in this challenge. The exploit registered main as a signal handler, which does come with an endbr64 instruction.

2) No signal frame protections

Indirect calls and return addresses are protected by hardware, but when executing a signal handler the kernel still stores and loads the instruction pointer on/from the writable stack.

A possible way to address this is by protecting (a subset of) the register state using data on the shadow stack, for example as Rick Edgecombe proposes here:

But besides that I've wondered if there could be a security benefit to adding some fields of the sigframe (RIP being the prime one) to the shadow stack, or a cryptographic hash of the sigframe.

One difficulty with this approach is that some programs want to modify the sigframe, however this would still be possible using shadow stack writes (the wrssq instruction).

3) Unnecessary landing pads

The main issue in my opinion is the sheer amount of endbr64 instructions in the binary. If the landing pads are only required for indirect calls, why does signal have one in the first place? Does _dl_make_stacks_executable really need one? And mprotect (I have a suspicion that the provided function pointer sets the third argument to 0 to prevent mprotect(addr, size, PROT_RWX) calls :) )?

The challenge binary was statically linked, so the compiler should know which of the functions actually need a landing pad. As I understand it, the fact that it still included endbr64 instructions everywhere is because they're added conservatively to globally-visible functions since they might be used across translation units as described in llvm issue 74400. It's only known at link time which ones are actually needed, so the linker could remove the unnecessary ones.

But how about dynamically linked code? You would usually find signal linked through libc. But signal is an exported symbol. The compiler and linker can't know if it will be used in a function pointer in the main binary or another library. Hence, all exported functions need the endbr64 instruction or else such uses will crash.

There are two solutions to this problem:

The runtime loader (ld.so) can infer which landing pads are actually required. It could be possible to patch the code to add/remove landing pads as needed at load time.
Create wrappers when exported functions are address-taken. If the main binary wants to use signal in a function pointer, it can create a small wrapper with endbr64; call signal and point the function pointer to this. This solution was proposed by Florian Weimer as an Alternative CET ABI.

In my opinion, option 2) sounds like an easy and promising way to address this.

Conclusion

This challenge should have been easily prevented by IBT, but in it's current form, it left too many options to the exploit writers to be useful. Fortunately, all of these can be addressed in software.

Since Linux userland IBT support hasn't landed yet, I believe it's a good opportunity to implement the "Alternative CET ABI" and landing pad removal in the linker. With those in place, this challenge should be impossible to solve (please prove me wrong if you can :)).

So to summarize, we can fix this in 3 simple steps:

Remove unneeded landing pads at link time
Implement the "Alternative CET ABI"
Protect sigframe data using the shadow stack

Thanks again to @hugeh0ge for this amazing challenge!

sroettger/ierae_cet.md