ssbug.md

Lets take a look at the vulnerable code:

if (s->servername_done == 0) {
    switch (servname_type) {
        case TLSEXT_NAMETYPE_host_name:
            if (s->session->tlsext_hostname == NULL) {
                if (len > TLSEXT_MAXLEN_host_name ||
                    ((s->session->tlsext_hostname = OPENSSL_malloc(len + 1)) == NULL)) {
                    *al = TLS1_AD_UNRECOGNIZED_NAME;
                    return 0;
                }
                memcpy(s->session->tlsext_hostname, sdata, len);
                s->session->tlsext_hostname[len] = '\0';
            }
    }
}

Here if tlsext_hostname is set to NULL, it will allocate len+1 bytes, then memcpy into that buffer with a size of len. Nothing out of the ordinary, But there were some oversights here that take a little digging to see.

OpenSSL is not thread-safe, it wasn't designed that way. You can force openssl to be thread safe, but it's obvious that the code to do this was wedged in there at the last minute.

Here is how we initialize OpenSSL to be thread-safe:

pthread_mutex_t *thread_locks;

unsigned long 
openssl_thread_id(void) {
       return (unsigned long) pthread_self();
}

void 
openssl_thread_lock( int mode, int lock_id, const char* file, int line ) {
    if ( mode & CRYPTO_LOCK )
       pthread_mutex_lock( &thread_locks[lock_id] );
    else
       pthread_mutex_unlock( &thread_locks[lock_id] );
}

void init_ssl_lock(void) {
       int num_thread_locks = CRYPTO_num_locks();
       thread_locks = calloc( sizeof( pthread_mutex_t ), num_thread_locks );

       for (i = 0; i < num_thread_locks; i++)
       {
           if ( pthread_mutex_init( &thread_locks[i], NULL ) ) 
           {
               fprintf(stderr, "Unable to create mutex\n");
               exit(80);
           }
       }

       CRYPTO_set_locking_callback( &openssl_thread_lock ); 
       CRYPTO_set_id_callback( &openssl_thread_id );
}

We can usually assume this would cover any type of potential race-conditions, but in this case, one was overlooked.

There are several functions that add, fetch, and remove entries from the session cache: ssl_get_new_session(), ssl_get_prev_session(), SSL_CTX_add_session(), SSL_CTX_remove_session(). These all call CRYPTO_w_lock()/CRYPTO_w_unlock() for any type of data modifications. Yay!

Here is the catch: ssl->session is shared across multiple threads, modified by the functions above. But in the function ssl_parse_clienthello_tlsext(), the value of s->session->tlsext_hostname is checked for a NULL value without any locking.

There is a possibility that (if the value of tlsext_hostname is NULL) that two threads enter the same block of code at the same time, one overwiting the others pointer to tlsext_hostname. The outcome of the flow would look something like this:


[ Thread-A ]                                  [ Thread-B ]
  if (tlsext_hostname == NULL)         |           |
     |                                 |        if (tlsext_hostname == NULL)
    tlsext_hostname = malloc(256);     |           |
     |                                 |         tlsext_hostname = malloc(1);        
    memcpy(tlsext_hostname, buf, 255); |           |
                                       |         memcpy(tlsext_hostname, buf, 1);

What happens here is Thread-A sees that tlsext_hostname is NULL. Thread-A then allocates a user controlled length (note: max of 256), finally doing memcpy of the user controlled data with a size of 255 (leaving room for the trailing '\0'). But in this case, between the time Thread-A allocated the buffer and the memcpy, Thread-B had allocated a much smaller amount of memory into the tlsext_hostname variable.

If executed properly, Thread-A would overwrite the heap by 254 bytes (255 - 1)

So the fix was basically to check the value of s->hit before doing anything stupid. Why?

int ssl3_get_client_hello(SSL *s) {
      /* bunch of crap here */

      /* remember, ssl_get_prev_session properly locks s->session */
      i = ssl_get_prev_session(s, p, j, d + n);
      if (i == 1) { /* previous session */
            s->hit=1;
      }
}

SSL *s is allocated on the heap thus thread safe. If ssl_get_prev_session() (which does proper locking) returns 1, s->hit is set, and then used later on inside ssl_parse_clienthello_tlsext to determine if tlsext_hostname is already allocated.

During one of our load tests we found that in some cases, randomly but under high load, SSL sockets would just stop receiving data which would result in hanging connections. This condition was basically impossible to reproduce in a consistent manner as we had no idea how such a thing could be triggered.

It HAD to be OpenSSL, because, well, OpenSSL is to blame for all of the problems anyone has ever had. Take the TV show "The Jersey Shore" as an example; while you could argue that the success of the show was the result of the diminishing intelligence of American viewers, I place the full blame on OpenSSL. This I am sure of.

Because of this issue, I now know OpenSSL far to well, that is, as much as is humanly possible without wanting to jam nails in my eyes. Let me sum up OpenSSL in two words: callback hell. Oh, and also: developers nightmare. It is impossible to maneuver using only static analysis. All debugging had to be done at run-time. I remember hours of 'step, step, step, step, "ah man, I'm in memcpy", finish, step, step, step'.

I was at my wits end. I felt broken, and, perhaps I was. It seemed as if all of my work was for not, as if this one single issue could destroy any notion of my self-worth. I was seriously considering quitting my job to work for 711, where my skills may be better applied.

It was at this point when I decided I must explore higher-level API's for any issues that might be the cause. So I started with the abstraction around the SSL IO: Libevent. I had already been an active contributor to the project and knew the internals fairly well. The OpenSSL IO was wrapped around libevent's bufferevent API, which is in itself, an abstraction around an event API.

A bufferevent is a simple concept: All IO is abstracted into one structure where you deal only with input and output buffers.

On the other hand, OpenSSL is incredibly complex, especially when it comes down to non-blocking IO. A read or a write may return one of many error conditions which may be the opposite of the function you called. For example, SSL_read() may return a status that says that SSL_write() must be called first. It's this type of logic that makes life hard.

One thing I remembered, which I wish I had remembered a week beforehand, was the fact that the OpenSSL bufferevents would always segfault unless it was put in a "deferred" state. A deferred bufferevent means any IO is not immediately executed. Instead, the IO would be queued and ran in the next iteration of the event loop.

Looking back, I think we all took this for granted, a fact of life: OpenSSL bufferevents did not work unless they are deferred, end of story. I don't think a nyone ever dug in to figure out exactly why.

The only reason I started looking was because it was easier to debug events that were executed immediately. The path of execution was much better consumed by my brain in this state.

From here, things started to fall in place, I was able to fully visualize the flow. I ran the the application through high-load and waited for a client to hang. A client hanged. I attached my debugger and started stepping my way around the event loop. From what I could see, the OS never flagged the underlying socket as readable again. Knowing this, I was confident that all the data had been received from the socket. Yet the application thought otherwise.

So where did that data go? I came across an OpenSSL function which I had never seen before: SSL_pending(). The documentation for this function is cryptic; a single line description: "obtain number of readable bytes buffered in an SSL object".

So wait a minute, does this mean that in some cases, the data returned by SSL_read() was not everything? Was there a magical buffer where data is hidden until the next call to SSL_read() was called?

As it turns out, yes. There was a magical buffer which contained data that SSL_read() did not immediately give to you. The idea was the next SSL_read() would give you back the data both in this magical buffer, and the data on the socket. But in a non-blocking event handling world, a socket would only be read when the file descriptor has been set as readable. If by chance the last 10 bytes of a request was in this magical buffer, the socket would never become readable again because technically, all the data had been read. It just so happened to be somewhere it wasn't expected to be.

The solution was simple, SSL_read() as normal, then immediately call SSL_pending(), and if there is pending data, SSL_read() again.

Problem solved. I relaxed. And life was good.

A few days after the release with this patch (considered a critical bugfix), emails started to trickle in about TOR instabilities. For those who do not know, TOR's entire networking backend is Libevent. Users started seeing high-cpu utilization and eventual crashes of their TOR processes. The only change made in that time-frame was mine.

I broke TOR. Not many can say this, but I can. With two lines of code, I was able to do a lot of damage.

While I was not able to reproduce the problem, I quickly figured out what would cause the problem. If the network was under high load, and SSL_pending always returned true, your program would be in a constant state of SSL_read(). In other words, an infinite loop that is only processing one connection. So in came the bad bug, and the worse bug.

The bad bug: we could potentially cause resource unfairness by reading too much data from the underlying bufferevent; it can potentially cause read looping if the underlying bufferevent is not deferred. The worse bug: If we didn't do this, then we would potentially not read any more until more data arrived, which could lead to an infinite wait.

Another release was made, and life was once again good.

In the end, it wasn't a direct problem with OpenSSL, instead it was an issue of OpenSSL not being used correctly. Which in turn, was a direct result of OpenSSL not documenting what WAS correct.

errzey/ssbug.md