This is a walk-through of how I solved the KEYGENME reverse engineering challenge at the Google CTF 2018 qualifier.
I bet you can't reverse this algorithm!
The challenge contained an executable binary called main
and a server
endpoint that sent us 5 random bytes and requested an answer in return. It
used the binary to validate the answer. The challenge was to reverse engineer
it, and find out what answer would satisfy the server.
I didn't really want to fire up a disassembler/decompiler right away, so I started to analyze it dynamically. I supposed it probably needed input in some way, so I piped random data into it and also supplied some random arguments.
I ran the binary with strace to see the system calls it made.
$ cat /dev/urandom | strace ./main a b c d e f |& cut -d\( -f1
execve
mmap
mmap
ptrace
fork
mmap
wait4
ptrace
wait4
...
My first reaction was: "oh boy, fork and ptrace!". It looked like it created a child process and the parent process debugged the child with ptrace. Or the child the parent. Or they both ptrace'd each other.
I really hoped it was not the latter case. That would have made debugging even more miserable than it later turned out to be. The reason is that gdb also uses ptrace internally, and a process can be traced with only one ptrace-based debugger at a time.
Note that strace doesn't trace child processes by default, so these were only the syscalls of the parent.
Let's go through the full output of strace:
execve("./main", ["./main", "a", "b", "c", "d", "e", "f"], 0x7ffc790cd238 /* 42 vars */) = 0
Ok, that is how a normal process starts. Nothing interesting here... wait! Where is "/etc/ld.preload.so" and "/usr/lib/libc.so.6" and all the usual loader/libc stuff that happens on program startup? Looks like this program was compiled statically or didn't use libc.
mmap(NULL, 65536, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANONYMOUS, 0, 0) = 0x7f2b4c4a6000
mmap(NULL, 65536, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANONYMOUS, 0, 0) = 0x7f2b4c496000
OK, it mapped two RWX pages.
ptrace(PTRACE_DETACH, 0, NULL, SIG_0) = -1 ESRCH (No such process)
I had no idea why it did that. In CTFs it's usually a good idea to just move on when not understanding something at first. Maybe it wouldn't be important.
fork() = 1737
This is forking a child process, its pid was 1737.
mmap(NULL, 65536, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANONYMOUS, 0, 0) = 0x7f2b4c486000
wait4(1737, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGTRAP}], 0, NULL) = 1737
Mmap again. The wait4 means that the parent process waited for the child to hit a debugger trap instruction (int3).
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_TRAPPED, si_pid=1737, si_uid=1000, si_status=SIGTRAP, si_utime=0, si_stime=0} ---
The waiting was not in vain, here a signal notified the parent that the child had reached a debugger trap instruction and stopped. That was exactly what the parent was waiting for.
ptrace(PTRACE_CONT, 1737, NULL, SIG_0) = 0
wait4(1737, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGTRAP}], 0, NULL) = 1737
Here the parent allows the child to continue.
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_TRAPPED, si_pid=1737, si_uid=1000, si_status=SIGTRAP, si_utime=0, si_stime=0} ---
open("/proc/1737/maps", O_RDONLY) = 3
read(3, "560d1f5d9000-560", 16) = 16
close(3) = 0
Another trap instruction reached. This time the parent read the address of the child's lowest mapped region. This was probably the child's virtual base address.
ptrace(PTRACE_GETREGS, 1737, NULL, 0x7f2b4c486020) = 0
process_vm_readv(1737, [{iov_base="\314\353\376", iov_len=3}], 1, [{iov_base=0x560d1f5d97c5, iov_len=3}], 1, 0) = 3
process_vm_writev(1737, [{iov_base="^H\211", iov_len=3}], 1, [{iov_base=0x560d1f5d97c5, iov_len=3}], 1, 0) = 3
ptrace(PTRACE_SETREGS, 1737, NULL, 0x7f2b4c486020) = 0
Yay, a bunch of syscalls I never used before! I hit up the man pages of these syscalls. The ptrace output translated to English:
- Parent process reads the registers of the child process
- Parent reads 3 bytes ("\314\353\376" == "\xcc\xeb\xfe") from the child's memory
- Parent writes another 3 bytes ("^H\211" == "\x5e\x48\x89") to the same address
- Parent modifies the child's registers.
The address of the read and write (0x560d1f5d97c5) was very close to the child's base address (0x560d1f5d9000), so it probably modified an executable memory region in the child. Maybe this was where the child had been stopped due to an int3 instruction?
What were the data read? Maybe x64 code?
$ printf "\314\353\376" | ndisasm -b64 -
00000000 CC int3
00000001 EBFE jmp short 0x1
Yes, the expected CC (int3) instruction. This meant that the parent process live patched the child process when that hit an int3. The EBFE instruction is an infinite loop, probably just a placeholder here. It would get overwritten later.
What did it patch in?
$ printf "^H\211" | ndisasm -b64 -
00000000 5E pop rsi
00000001 48 rex.w
00000002 89 db 0x89
Total gibberish. Maybe part of a long, multi-byte instruction? Time would tell. I continued reading the strace output:
ptrace(PTRACE_CONT, 1737, NULL, SIG_0) = 0
This allowed the child to continue execution.
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_TRAPPED, si_pid=1737, si_uid=1000, si_status=SIGTRAP, si_utime=0, si_stime=0} ---
wait4(1737, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGTRAP}], 0, NULL) = 1737
Another wait, another signal.
ptrace(PTRACE_GETREGS, 1737, NULL, 0x7f2b4c486020) = 0
process_vm_writev(1737, [{iov_base="\314\353\376", iov_len=3}], 1, [{iov_base=0x560d1f5d97c5, iov_len=3}], 1, 0) = 3
process_vm_readv(1737, [{iov_base="\314\303", iov_len=2}], 1, [{iov_base=0x560d1f5d9845, iov_len=2}], 1, 0) = 2
process_vm_writev(1737, [{iov_base="H\301", iov_len=2}], 1, [{iov_base=0x560d1f5d9845, iov_len=2}], 1, 0) = 2
ptrace(PTRACE_SETREGS, 1737, NULL, 0x7f2b4c486020) = 0
ptrace(PTRACE_CONT, 1737, NULL, SIG_0) = 0
The first writev here undid the previous patch as it wrote the "CCEBEF" instructions back to 0x560d1f5d97c5. The second writev introduced another patch, but to a different address. It looked like the parent process hid its steps, it had been keeping only one patch visible in the child's memory at a time. Even if I had created a core dump of the child process, I could not have recovered all of its instructions. I would have only seen the placeholder values at certain addresses instead of the normal instructions.
The rest of the strace output was very similar.
I reran strace to dump strings in a sane hexadeceimal notation (-xx) and used awk to filter for the patches. (I'm quite fond of shell one-liners). I also grouped the patches by two. This way I could see that every address is patched exactly twice. First probably valid x64 code is written, then the second patch restores the original int3 value with a placeholder after it.
$ cat /dev/urandom | strace -i -xx ./main |& tr -s '=[{, ' ' ' | awk '/writev/{print $9, $4}' | paste -sd '\t\n'
0x5593b68dd7c5 "\x5e\x48\x89" 0x5593b68dd7c5 "\xcc\xeb\xfe"
0x5593b68dd845 "\x48\xc1" 0x5593b68dd845 "\xcc\xc3"
0x5593b68dec1b "\x5d\x41" 0x5593b68dec1b "\xcc\xc3"
0x5593b68dec1e "\x41\x5d" 0x5593b68dec1e "\xcc\xc3"
0x5593b68de744 "\x90\x5d\xc3" 0x5593b68de744 "\xcc\xeb\xfe"
0x5593b68ddb67 "\x48\x8b\x85" 0x5593b68ddb67 "\xcc\xeb\xfe"
0x5593b68ddb71 "\x48\x8d" 0x5593b68ddb71 "\xcc\xc3"
0x5593b68de8bd "\x8b\x00" 0x5593b68de8bd "\xcc\xc3"
0x5593b68de9e9 "\x48\x8b" 0x5593b68de9e9 "\xcc\xc3"
0x5593b68ddcc8 "\x89\x45\xd8" 0x5593b68ddcc8 "\xcc\xeb\xfe"
0x5593b68ddd25 "\x8b\x12\x01" 0x5593b68ddd25 "\xcc\xeb\xfe"
0x5593b68ddd56 "\x01\xca" 0x5593b68ddd56 "\xcc\xc3"
0x5593b68ddda8 "\x31\xd1\x48" 0x5593b68ddda8 "\xcc\xeb\xfe"
0x5593b68dde22 "\x8b\x45\xc4" 0x5593b68dde22 "\xcc\xeb\xfe"
0x5593b68dde32 "\x8b\x55" 0x5593b68dde32 "\xcc\xc3"
0x5593b68dde3b "\x48\x83\xc2" 0x5593b68dde3b "\xcc\xeb\xfe"
0x5593b68dde61 "\x8b\x55" 0x5593b68dde61 "\xcc\xc3"
0x5593b68ddf2e "\x01\xd0\x89" 0x5593b68ddf2e "\xcc\xeb\xfe"
0x5593b68ddfeb "\x8b\x45\xe0" 0x5593b68ddfeb "\xcc\xeb\xfe"
0x5593b68de186 "\x8b\x0a" 0x5593b68de186 "\xcc\xc3"
0x5593b68de2b6 "\x01\xd0" 0x5593b68de2b6 "\xcc\xc3"
0x5593b68de2f6 "\x8b\x45\xbc" 0x5593b68de2f6 "\xcc\xeb\xfe"
0x5593b68de3c5 "\x01\xd0" 0x5593b68de3c5 "\xcc\xc3"
0x5593b68de482 "\x8b\x55\xdc" 0x5593b68de482 "\xcc\xeb\xfe"
0x5593b68de540 "\x48\x83\xc2" 0x5593b68de540 "\xcc\xeb\xfe"
0x5593b68de549 "\x01\xca\x01" 0x5593b68de549 "\xcc\xeb\xfe"
0x5593b68de5f1 "\x8b\x4d\xc8" 0x5593b68de5f1 "\xcc\xeb\xfe"
0x5593b68deaed "\x48\x8b" 0x5593b68deaed "\xcc\xc3"
0x5593b68deb0f "\xc1\xe8\x08" 0x5593b68deb0f "\xcc\xeb\xfe"
0x5593b68dd934 "\x8b\x45\xf4"
I turned my attention to the child process. I used strace with -ff to trace child processes. Strace output can get confusing when the syscalls of two traced processes overlap, so I instructed strace to save its output into separate files per process with timestamps prepended. I later merged these files with awk.
$ cat /dev/urandom | strace -ff -tt -o /tmp/out ./main a b c d e
$ awk '{print $1, FILENAME, $0}' /tmp/out* | sort | cut -d ' ' -f2,4- | cut -d. -f2-
Let's see the output:
19174 execve("./main", ["./main", "a", "b", "c", "d", "e"], 0x7ffefd60ea00) = 0
OK, the pid of the parent was 19174, so the other pid (probably 19175) would be the child.
19174 mmap
19174 mmap
19174 ptrace(PTRACE_DETACH, 0, NULL, SIG_0) = -1 ESRCH (No such process)
19174 fork() = 19175
19174 mmap
These we saw before, 3 mmaps, a fork and the ptrace I had no idea about.
19175 mmap(NULL, 65536, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANONYMOUS, 0, 0) = 0x7f9bd7d6e000
The first syscall of the child process was an mmap.
19174 wait4(19175, [{WIFSIGNALED(s) && WTERMSIG(s) == SIGTRAP && WCOREDUMP(s)}], 0, NULL) = 19175
Parent waits for child. This was the last time the strace output contained the parent's syscalls. The PTRACE_TRACEME call of the child failed because it was already ptrace'd by strace, so the parent would never get SIGCHLD with SIGTRAP status, it would keep on waiting forever.
19175 read(0, "#\3327\334\210", 5) = 5
19175 read(0, "\207\214\0\35\335^\373\313a}\3027T\10"..., 32) = 32
The child read first 5 then 32 bytes from the standard input.
19175 prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0) = 0
19175 seccomp(SECCOMP_SET_MODE_STRICT, 1, NULL) = -1 EINVAL (Invalid argument)
19175 seccomp(SECCOMP_SET_MODE_FILTER, 0, {len=8, filter=0x7f9bd7d81e7e}) = 0
Here we can see the child setting up some kind of a sandbox. I did not understand the purpose of the first (failed) seccomp call. Had it been successful in setting up a STRICT seccomp context the second call would have surely caused a seccomp violation.
And the second seccomp call? I was like
Who cares, it's probably not important. ¯\(ツ)/¯
Looking back, this was a huuuuge mistake. Had I examined the seccomp filter, I would not have needed to resort to using ... well, what I used later to solve the challenge.
19175 ptrace(PTRACE_TRACEME) = -1 EPERM (Operation not permitted)
Yes it failed, because strace already ptrace'd it. Fortunately the child didn't check the return code for errors.
19175 memfd_create("hi", MFD_CLOEXEC) = 3
The child created an anonymous file. The file got file descriptor 3.
19175 write(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0"..., 14488) = 14488
The child wrote 14488 bytes to the anonymous file. It looks like the header of an ELF executable file, doesn't it? I guessed that the child would later call execve to execute this binary.
19175 fchmod(3, 0555) = 0
19175 execve("/proc/self/fd/3", ["#\3327\334\210", "\207\214"], NULL) = 0
Yes, exactly. The values it passed as command line arguments were familiar, these were the 5 and 32 byte strings it had read from the standard input before.
I rerun strace with a huge "maximum string size" value (-s 1000000). I could thus recover the initial (unpatched) binary the child dropped into fd3.
$ rm /tmp/out.*
$ cat /dev/urandom | strace -s 1000000 -xx -ff -tt -o /tmp/out ./main a b c d e
$ cat /tmp/out.* | grep 14488 | cut -d\" -f2 | tr -d '\\x' | xxd -p -r > dropped.elf
$ file dropped.elf
dropped.elf: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, missing section headers
I tried to run the dropped binary. Note that I used exec -a to pass the first string as the zeroth argument, just like in the execve call before.
$ sh -c 'exec -a AAAAA ./dropped.elf BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB'
Trace/breakpoint trap (core dumped)
Hmm, no surprise, it still lacked the binary patches.
To patch the binary it was important to know if the patches had depended on the input in some way. I ignored the PTRACE_SETREGS calls for a while.
To find that out I straced the parent multiple times with different inputs and compared the strace logs. I also disabled address space randomization.
$ echo 0 | sudo tee /proc/sys/kernel/randomize_va_space
$ cat /dev/urandom | strace ./main a b c d e > run.1
$ cat /dev/urandom | strace env -i ./main xx yyy zzz > run.2
$ vimdiff run.1 run.2
The traces were very similar, there were only small differences in the pid numbers and the first couple of bytes of memory locations. (I thought I disabled ASLR?). Fortunately the last 3 digits of the memory addresses were the same. That meant that the parent always applied the same patches between runs.
It was easy to apply the patches to the input file, I just had to take care not to apply the "undoing" patches. The following python program (using the pwntools library) does exactly that.
$ cat patch_dropped.py
from pwn import *
context.arch = "amd64"
e = ELF("dropped.elf")
base = 0x555555554000
done = set()
for line in open("run.1"):
for char in '"()=[]{},':
line = line.replace(char, " ")
line = line.split()
if len(line) > 2 and line[1] == "process_vm_writev":
data = unhex(line[4].replace("\\x", ""))
addr = int(line[9], 16) - base
if addr not in done:
e.write(addr, data)
done.add(addr)
e.save("dropped_patched.elf")
$ python patch_dropped.py
$ file dropped_patched.elf
dropped_patched.elf: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, BuildID[sha1]=1b6bb2a46a5e760d6de3fb5c4782e05c0a118f11, stripped
I tried running it:
$ chmod +x dropped_patched.elf
$ sh -c 'exec -a AAAAA ./dropped_patched.elf BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB'
NO!
Trace/breakpoint trap (core dumped)
OK, that "NO!" output meant that my python program did what I wanted.
Wait, what about the PTRACE_SETREGS? Was there any important stuff there? For that I had to dig into the parent process.
It was time to examine the main binary in my favorite disassembler. Brace yourself dear reader, because the binary was pretty obfuscated!
$ objdump -Mintel -d ./main
4000b0: 48 b8 88 77 66 55 44 movabs rax,0x1122334455667788
4000b7: 33 22 11
4000ba: 48 89 04 25 a6 46 60 mov QWORD PTR ds:0x6046a6,rax
4000c1: 00
Strange value at address 0x6046a6, I wondered what it did.
4000c2: e8 c2 00 00 00 call 0x400189
What was that 0x400189 function it called?
400189: b8 09 00 00 00 mov eax,0x9
40018e: 48 31 ff xor rdi,rdi
400191: be 00 00 01 00 mov esi,0x10000
400196: ba 07 00 00 00 mov edx,0x7
40019b: 41 ba 22 00 00 00 mov r10d,0x22
4001a1: 4d 31 c0 xor r8,r8
4001a4: 4d 31 c9 xor r9,r9
4001a7: 0f 05 syscall
4001a9: c3 ret
Syscall 0x9 is mmap. This was the mmap call we saw in the strace log. OK, so the function at 0x400189 was mmap(0x10000, RWX). I continued with the main:
4000c7: 48 89 c5 mov rbp,rax
4000ca: e8 ba 00 00 00 call 0x400189
This is another mmap call, at this point RBP contined the address of the first RWX page and RAX the second.
4000cf: 48 89 c7 mov rdi,rax
4000d2: 48 8d 34 25 bc 01 60 lea rsi,ds:0x6001bc
4000d9: 00
4000da: b9 a8 44 00 00 mov ecx,0x44a8
4000df: f3 a4 rep movs BYTE PTR es:[rdi],BYTE PTR ds:[rsi]
Ok, so here it copied 0x44a8 bytes from 0x6001bc to the second RWX page.
4000e1: 48 8b 34 25 a6 46 60 mov rsi,QWORD PTR ds:0x6046a6
4000e8: 00
After this instruction the magic constant 0x1122334455667788 was in RSI.
4000e9: 48 89 c7 mov rdi,rax
4000ec: b9 a8 44 00 00 mov ecx,0x44a8
4000f1: 48 31 37 xor QWORD PTR [rdi],rsi <---.
4000f4: 48 83 c7 08 add rdi,0x8 |
4000f8: 48 83 e9 08 sub rcx,0x8 |
4000fc: 7f f3 jg 0x4000f1 ----------------'
It looks like the code at 0x6046a6 was encrypted with a simple XOR encryption, where the key was the 0x1122334455667788 constant. This was the decryption code.
4000fe: 50 push rax
4000ff: c3 ret
This is basically a jump to rax, that is to the decrypted code.
I patched the binary to remove the XOR encryption.
$ cat decrypt_main.py
from pwn import *
context.arch = "amd64"
main = ELF("./main")
# xor decrypt
enc = main.read(0x06001BC, 0x44a8)
key = p64(0x1122334455667788) * (len(enc)//8)
dec = xor(enc, key)
main.write(0x6001BC, dec)
# skip decrypting routine
main.write(0x4000E9, asm("push rax; ret"))
main.save("./main_decrypted.elf")
I won't go through everything in the main binary, but here I list some interesting obfuscation/shellcode techniques I found in it.
006001DE: 5B pop rbx
0x6001DF: 66 83 C3 0A add bx, 0Ah
Rbx contains the address of 0x6001E7 here.
0x6001E3: 80 33 82 xor byte ptr [rbx], 82h
Xor the 2nd byte of the next instruction. That instruction would move 0xe7 into eax. This is the syscall number of exit_group.
0x6001E6: B8 E7 00 00 00 mov eax, 0E7h
After the xor is executed this instruction is really mox eax, 0x65
, where
0x65 is the syscall number of ptrace. 0x11 is PTRACE_DETACH.
0x6001EB: BF 11 00 00 00 mov edi, 11h
0x6001F0: 0F 05 syscall
0x6001F2: C3 retn
So this is where our mysterious PTRACE_DETACH came from! By only skimming through the instructions one could think the binary exits here, but in reality it changes the exit syscall into -- the effectively no-op -- PTRACE_DETACH.
In normal binaries executable code and read-only data are usually in different section of the binary. This program mixes them in the XOR-ed page. The problem is that the address of the decrypted page changes every time the binary is run so it uses a nice trick to get data addresses in a position independent way.
See for example the "function" that loads the address of the seccomp filter into memory:
get_rip:
0x604033: pop rax
0x604034: retn
get_seccomp_filter_data_address:
0x604035: call 0x604033 ; get_rip
start_of_seccomp_filter_data:
0x60403A: sock_filter <20h, 0, 0, 4>
0x60403A: sock_filter <15h, 0, 5, 0C000003Eh>
0x60403A: sock_filter <20h, 0, 0, 0>
0x60403A: sock_filter <35h, 3, 0, 40000000h>
0x60403A: sock_filter <15h, 1, 0, 5Fh>
0x60403A: sock_filter <6, 0, 0, 7FFF0000h>
0x60403A: sock_filter <6, 0, 0, 50001h>
0x60403A: sock_filter <6, 0, 0, 0>
The entry point is get_seccomp_filter_data_address
. When a call to
get_seccomp_filter_data_address
is executed, a return address is pushed onto
the stack. When the second call is executed, the start_of_seccomp_filter_data
address is then pushed onto the stack, because this is where the second call
should return to. Execution then jumps to get_rip
, that pops the
start_of_seccomp_filter_data
address from the stack into RAX. It then
executes a RET, which pops off the original caller's address and jumps to it.
And we are done, start_of_seccomp_filter_data
is in RAX, and the caller
can continue execution normally.
I found many places in the binary where this trick was used.
A nice example example for this I found is at address 0x6042D9.
load_data_2_gadget:
0x6042D9: mov r8, rax
0x6042DC: lea r9, load_data_3_gadget
0x6042E3: push r9
0x6042E5: jmp get_data_2_address
load_data_3_gadget:
0x6042EA: mov r9, rax
0x6042ED: lea r10, load_data_4_gadget
0x6042F4: push r10
0x6042F6: jmp get_data_3_address
What happens here? The LEA and PUSH instruction pushes the address of the next gadget onto the stack. The jmp instructions call the get_data functions without pushing their return value. When the get_data functions return they will pop off the address of the next gadget jump to it.
Now that I knew how the main binary worked I could examine what the PTRACE_SETREGS calls did. I wrote a short gdb script for it. (This script doesn't work with main_patched.elf, only with the original binary.)
# stop right after XOR-ing the real code
break *0x04000FF
# run until the breakpoint
run <<< AAAAABBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
# this is where the real code of the program starts
set $base = *(unsigned long*)($rsp)
# break right after the getregs syscall and print the old reg values
break *($base + 15599 + 2)
commands
printf "XXX GETREGS registers\n"
x/27gx $r10
printf "XXX"
continue
end
# break right before the setregs syscall and print the new reg values
break *($base + 15717)
commands
printf "XXX SETREGS registers\n"
x/27gx $r10
printf "XXX"
continue
end
# continue execution
continue
I ran this with
$ gdb --batch -x main.gdb ./main &> REGISTERS.txt
I could see from the output that the SETREGS commands only decremented the child's RIP by one. Why? Because the parent applies a patch to the child when the child steps on an int3 instruction. When that instruction is exeucted the RIP will point to the next instruction. By decrementing the RIP after patching, the parent forces the child to continue execution from the start of the patch, the address of the instruction that was originally int3.
That didn't help much. I turned my attention towards the dropped binary.
$ file dropped_patched.elf
dropped_patched.elf: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, BuildID[sha1]=1b6bb2a46a5e760d6de3fb5c4782e05c0a118f11, stripped
$ strace sh -c 'exec -a AAAAA ./dropped_patched.elf BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB'
[..snip..]
umask(000) = 022
umask(000) = 000
fstat(1, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
[..snip..]
This was the (decompiled) main function:
void __usercall __noreturn main(char **args@<rsi>, char **a2@<rdx>, char **a3@<rax>)
{
*((_BYTE *)exit + 2) = 0xBAu;
*a3 = *a2;
maybe_atexit((__int64)int3);
exit(0);
}
Obfuscation again. The exit function contained only a jump to the exit PLT entry. By modifying a byte in the exit function the dropped binary redirected the exit function to another (hidden) PLT entry. That PLT entry contained the umask syscall. So I manually patched exit+2 to be 0xBA, and turned the patching instruction into NOPs. After that IDA could decompile the function:
void main(int argc, char *argv[])
{
size_t five; // rax
char hex_hash2_32[33]; // [rsp+20h] [rbp-110h]
char raw_hash1_16[16]; // [rsp+50h] [rbp-E0h]
char hex_hash1_32[33]; // [rsp+60h] [rbp-D0h]
char raw_hash1_160[160]; // [rsp+90h] [rbp-A0h]
maybe_atexit(int3);
exit_that_is_umask_really(0LL);
memset(hex_hash1_32, 0, 33);
five = strlen(argv[0]);
hash1_init(raw_hash1_160);
hash1_calc(raw_hash1_160, argv[0], five);
hash1_compress(raw_hash1_16, raw_hash1_160);
enhex_string(raw_hash1_16, hex_hash1_32);
memset(hex_hash2_32, 0, 33);
hash2(argv[1], hex_hash2_32);
if (!strcmp(hex_hash2_32, hex_hash1_32))
puts("OK!");
else
puts("NO!");
}
To sum it up, it takes 5 bytes from argv[0] and 32 bytes from argv[1], hashes them with two different hash functions and compares the result.
Hash1 is at 0xC46. There was some calculation there that looked like a hash function, so I named it hash1. I did not recognize the hash algorithm.
The hash function -- just like the main function -- starts with a call to exit_that_is_umask_really. After this it checks if the PWD environment variable exists and modifies a variable if it does. So the hash calculation depended on the PWD environment variable! I looked back to the strace of the child process to determine how it was run:
19175 execve("/proc/self/fd/3", ["#\3327\334\210", "\207\214"], NULL) = 0
With NULL environment, so no PWD variable here. I replicated this in my runner script (note the -c argument of exec):
$ exec -c -a AAAAA ./dropped_patched.elf BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
void hash2(char *hex_input, char *hex_output)
{
char v2; // al
char hashed[16]; // [rsp+10h] [rbp-30h]
char unhexed[24]; // [rsp+20h] [rbp-20h]
int j; // [rsp+38h] [rbp-8h]
int i; // [rsp+3Ch] [rbp-4h]
for ( i = 0; i <= 15; ++i )
{
v2 = unhex_byte(&hex_input[2 * i]);
unhexed[i] = v2;
}
for ( j = 0; j <= 15; ++j )
hashed[j] = j ^ (unhexed[j] & 0xF | unhexed[15 - j] & 0xF0) ^ 16 * j;
enhex_string(hashed, hex_output);
}
Well this didn't look like a military grade, secure hash function. The following python2 code calculates the inverse of hash2:
def inverse_hash2(enc):
unhexed = enc.decode('hex')
hashed = ""
for j in xrange(16):
k = 15 - j
b1 = ((ord(unhexed[j]) ^ j ^ (16 * j)) & 0xF0)
b2 = ((ord(unhexed[k]) ^ k ^ (16 * k)) & 0x0F)
hashed = chr(b1 | b2) + hashed
return hashed.encode('hex')
At this point it was obvious how to crack the challenge.
-
Let's say the server sent us the string "MaLAC".
-
We run dropped_patched.elf in gdb with NULL environment and argv[0] set to "MaLAC" and argv[1] set to some random 32 byte string.
-
We stop the execution right after hex_hash1_32 is calculated.
-
We observe that hex_hash1_32 = "784542fbc8f7f8c99701967786997220".
-
We can calculate inverse_hash2("784542fbc8f7f8c99701967786997220") and send the result to the server as the correct solution.
Step 3 is tricky, because gdb always sets PWD when it runs a binary. I used
the very useful set wrapper-script
gdb command to circumvent that. This was
my wrapper script:
#!/usr/bin/env bash
exec -c -a $TOKEN ./dropped_patched.elf BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
I did all 5 steps, but the server's answer was:
BAD!
I tried it multiple times, but BAD all the time. Even when I tried with the
original, unmodified main
binary locally, it said
NO!
Clearly I had missed something during reversing. ᕦ(ò_óˇ)ᕤ
Now that I'm writing this writeup it is fairly obvious what I missed: the seccomp filter, of course. I never examined it.
Turns out, it hijacked the umask system call to return a different value. The purpose of the umask call in hash1 was twofold: on one hand it was disguised as exit to fool the decompiler, but on the other hand it also modified RAX to the return value of the seccomp-hijacked syscall. RAX was later used in the hash calculation.
The problem was that I had been running the extracted binary without the tricky seccomp filter, because the seccomp syscall happened in the child process, before it execve'd the dropped_binary.
Nevertheless I managed to solve the challenge, but it took considerably longer than it would have taken had I seen this nice trick.
It became clear that I could not reproduce the original environment of dropped_patched.elf, so I turned to dumping the value of hex_hash1_32 from the original binary.
The problem was that it was ptrace'd by the parent, so I couldn't just connect with gdb to it.
I tried ltrace to trace the dynamic function calls.
$ ltrace ./main <<< MaLACBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
ltrace ./main <<< MaLACBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
Couldn't find .dynsym or .dynstr in "/proc/25987/exe"
NO!
Unfortunately main
was compiled statically, without libc, so that didn't work. (Yes, I could have attached to the child process, but that didn't occur
to me back then.)
I tried to create core dumps of the child process by sending SIGQUIT with
kill -3
to it, but the resulting dumps did not contain the string I was
looking for. Maybe my timing was off. Also for some reason the linux system I
used didn't create core files directly, it sent them to some stupid systemd
module (coredumpctl) that I wasn't familiar with, so I gave up this approach
after a couple tries.
I tried to overwrite strcmp with LD_PRELOAD to print its arguments, but it failed too, because (remember?) the binary is execve'd with NULL environment pointer.
What I finally did was, that I looked for the execve call in main, and patched it so that it passed the environment pointer. For that I had to find the environment pointer. There is usually one on the stack, so I messed around in gdb at the execve call and I found that there was indeed one at RSP+0x48.
I patched main
with this script:
from pwn import *
import io, os
context.arch = "amd64"
main = ELF("./main")
# un-xor the instructions
enc = main.read(0x06001BC, 0x44a8)
key = p64(0x1122334455667788) * (len(enc)//8)
dec = io.BytesIO(xor(enc, key))
base = 0x6001BC
addr = 0x603DC7 - base # execve call
new = asm("mov rdx, rsp ; add rdx, 0x48 ; syscall ; nop ; nop ; nop ; nop")
dec.seek(addr)
old = dec.read(len(new))
assert old == unhex("4831d20f054c89c8e879080000")
dec.seek(addr)
dec.write(new)
# re-xor the instructions
enc = xor(key, dec.getvalue())
main.write(0x06001BC, enc)
main.save("./main_patched")
os.system("chmod +x ./main_patched")
I compiled a shared library to LD_PRELOAD:
$ cat preload.c
#define _GNU_SOURCE
#include <dlfcn.h>
#include <stdio.h>
#include <string.h>
static int (*real_strcmp)(const char*, const char*) = NULL;
static void mtrace_init(void)
{
real_strcmp = dlsym(RTLD_NEXT, "strcmp");
real_strlen = dlsym(RTLD_NEXT, "strlen");
if (NULL == real_strcmp) {
fprintf(stderr, "Error in `dlsym`: %s\n", dlerror());
}
}
int strcmp(const char *s1, const char *s2)
{
if (real_strcmp == NULL) {
mtrace_init();
}
printf("YYY: strcmp(\"%s\", \"%s\")\n", s1, s2);
printf("XXX: %s\n", s2);
fflush(stdout);
return real_strcmp(s1, s2);
}
$ gcc -Wall -Wextra -shared -fPIC -o preload.so preload.c -ldl
And created a small bash script to run main_patched with only LD_PRELOAD in its environment (with no PWD).
$ cat get_this_shit.sh
#!/usr/bin/env bash
printf ${token}BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB |
env -i LD_PRELOAD=$(pwd)/preload.so ./main_patched |
grep XXX: | cut -d: -f2 | tr -cd '[0-9a-fA-F]'
The full exploit script:
from pwn import *
import sys, os
def hash2(dec):
dec = bytearray(unhex(dec))
enc = [0] * 16
for j in range(16):
enc[j] = j ^ ((dec[j] & 0xF) | (dec[15 - j] & 0xf0)) ^ 16 * j
return enhex("".join(chr(i) for i in enc))
def inverse_hash2(enc):
assert len(enc) > 30
unhexed = enc.decode('hex')
hashed = ''
for j in xrange(16):
k = 15 - j
b1 = ((ord(unhexed[j]) ^ j ^ (16 * j)) & 0xF0)
b2 = ((ord(unhexed[k]) ^ k ^ (16 * k)) & 0x0F)
hashed = chr(b1 | b2) + hashed
return hashed.encode('hex')
def solve(token5):
print "TOKEN:", token5
p = process(["./get_this_shit.sh", token5])
p.shutdown()
hex_hash1_32 = p.recvall()
print "INNER32:", hex_hash1_32
solution = inverse_hash2(hex_hash1_32).strip()
assert hash2(solution).upper() == hex_hash1_32.upper()
print "SOLUTION:", solution, "\n\n\n"
return solution
r = remote("keygenme.ctfcompetition.com", 1337)
while True:
token5 = r.readline().strip()
r.sendline(solve(token5))
rsp = r.recvline()
print "RESPONSE", rsp
assert "OK" in rsp
After answering the server's requests like a dozen time, it got the flag:
CTF{g1mm3_A11_T3h_keyZ}
I liked this challenge because while it was not impossibly hard, its author employed many different techniques to hinder the reverse engineering effort.
It had self-modifying code, ropchain-like control flow, hidden payload (I still have no idea how the dropped binary is encoded in the main binary), ptraced child process, and live binary patching.
It forced me to be creative with debugging, I used gdb, strace, ltrace,
LD_PRELOAD, I binary patched a lot and learned about SIGQUIT
and
set wrapper-script
.
Also, KEYGENME had the exit trick that fooled IDA and the seccomp trick that fooled me ☺.
Had the child process also ptrace'd and live-patched its parent, it would have been the perfect CTF challenge I would have probably given up on halfway through.