Beloved users, and whomever else may find it of interest,

We recently made the switch from nginx to uwsgi. Well, I say recently, as far as I can tell from the commit logs we started work on it around July 10th, so that's fully six weeks ago. We just deployed it this week, and after a bumpy first few days it seems to be settling in well. We thought we'd share why we switched, and how it's going.

tl;dr:

We switched because:

we need to dynamically configure new virtualhosts for our users
Apache won't dynamically load config, it needs a restart graceful which is still too disruptive
a single, generic Apache config that handles all users is getting unweildy
nginx+uwsgi will make it much easier for us to do things like static files and per-user config tweaks
we're also hoping for some performance improvements

Our experience was:

sure enough, nginx+uwsgi is much simpler to configure and much more flexible. but:-
Apache's model is to only start workers when they're needed. uwsgi starts them all up-front. That took some hacking, since we actually wanted the on-demand behaviour.
no perf. improvements out-of-the-box, but we think there's lots of potential for tuning
we did plenty of functionality testing, but not enough load testing, so our first deploy was a little rough.

Onto the detail!

Apache graceful isn't

For those that don't know, PythonAnywhere allows users to develop but also host Python apps, either on our domain via a username.pythonanywhere.com subdomain, or from their own domain names. So we need to be able to dynamically add, remove and update virtualhost configs. One way or another, our web servers need to look at an incoming request, and, based on the domain name, redirect them to the appropriate Python WSGI application.

With Apache, there's two ways of doing this - you can either write a config file for each user and domain, or you can try to write a single global config file that can handle any domain and point it to the right WSGI app. The problem is that the first approach requires you to bounce Apache for each config change - not nice: even if restart graceful doesn't disconnect workers until they've finished existing requests, it still means that new requests have to wait for the server to spin up again - we don't want to be taking down one of our web servers, even for just a few seconds, every time a new user signs up!

Meanwhile the second approach is horrible to configure, and won't let us do everything we want anyways.

Yucky Apache config

Let's take a little peek at some of the contortions we were putting ourselves through in the old days:

LogFormat "%{cased_username}e|%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-agent}i\"" combined-vhost

LoadModule vhost_dbd_module /usr/lib/apache2/modules/mod_vhost_dbd.so
DBDExptime 5

<VirtualHost *:80>
    ServerAlias *

    FcgidMaxRequestLen 2097152

    DBDriver mysql
    DBDParams host=mysql.server,user=django_anywhere,pass=MYSQL_PASSWORD,dbname=anywhere
    DBDocRoot "SELECT concat('/mnt/user_storage/webapps/', lower(username), '/escalate_privileges.py'), username as cased_username from auth_user, webapps_userdomain where auth_user.id = webapps_userdomain.user_id and webapps_userdomain.domain_name = %s" HOSTNAME

    CustomLog "|/usr/bin/python -u /home/anywhere/user_logs/splitter.py" combined-vhost

    <DirectoryMatch /mnt/user_storage/webapps/(.+)>
        Order allow,deny
        Allow from all
        Options +ExecCGI
        AddHandler fcgid-script .py
    </DirectoryMatch>
</VirtualHost>

Can you see what's going on in there?

Well, you might spot that we foolishly allowed mixed-case usernames, and that it's causing is no end of troubles. Sadly we're just going to have to suck it up - nginx+uwsgi isn't going to make that any easier for us

Next you might see we've had to hand-roll a script called splitter.py that pipes user access logs to their own files instead of the default Apache ones. Not ideal perhaps, but it works.

The real pain starts in those DBDocRoot params: we're using mod_vhost_dbd to get Apache to check with the database for each request in order to figure out which Python WSGI file to send it to. ouch.

But that's not the end of it - escalate_privileges.py hints at a new world of ugly.

The escalate_privileges dance

#!/usr/bin/python
# Copyright (c) 2012 Resolver Systems Ltd.
# All Rights Reserved
#

import os
import pwd
import signal
import subprocess
import sys

from anywhere.jails.spawn import spawn_unpiped_chrooted_process


def get_username_from_filename():
    lowercase_username = os.path.basename(os.path.dirname(__file__))
    for pwd_rec in pwd.getpwall():
        if pwd_rec.pw_name.lower() == lowercase_username:
            return pwd_rec.pw_name
    raise Exception("No such user %r" % (lowercase_username,))


def get_signal_handler(pid):
    def handler(signum, _):
        subprocess.call(['/usr/bin/sudo', '/bin/kill', str(pid)])
    return handler


def main():
    username = get_username_from_filename()
    process = spawn_unpiped_chrooted_process(
        '/usr/bin/python /bin/serve_wsgi.py %s %s' % (username, " ".join(sys.argv[1:])),
        username
    )

    handler = get_signal_handler(process.pid)
    signal.signal(signal.SIGUSR1, handler)
    signal.signal(signal.SIGTERM, handler)
    process.wait()


if __name__ == '__main__':
    main()

Yep, that script's purpose is to spawn a process that chroots to the user's home directory (and note that the only way we can get the username is to read the current directory name - Apache point blank refuses to do anything helpful like set an environment variable for us). I'll spare you the additional fun of serve_wsgi.py, which is just for logging (the counterpart to splitter.py from earlier). We then hand off to the user's WSGI app.

Apache constrains us

Our Apache setup works (if it ain't broke...) for now, even if it does involve a database hit for every request, two hacks for access and error logs, and a strange dance involving passing parameters in the form of directory names to a custom chrooting script... But even discounting all that, it still limits us in terms of what we can do for the future.

For static file handling, we'd have to implement something a little like this:

RewriteCond /mnt/user_storage/homedirs/%{ENV:subdomain}/var/www/static%{REQUEST_URI} -f
RewriteRule ^/(.*) /mnt/user_storage/homedirs/%{ENV:subdomain}/var/www/static/$1 [L]
<DirectoryMatch /mnt/user_storage/homedirs/(.*)/var/www/static/>
    Order Allow,Deny
    Allow from All
    Options SymLinksIfOWnerMatch
</DirectoryMatch>
<Directory /mnt/user_storage/homedirs>
    Options SymLinksIfOWnerMatch
</Directory>

That tells Apache to check every request against a folder in the user's personal space, and uses mod_rewrite to return static files from that folder if they exist. We can even get symlinks working, at a pinch.

But, we can't support a custom location for static files for each user.

Moreover, mod_vhost_dbd severely constrains what we can do. You might have spotted these two lines, which appear to show us using some arbitrary info retrieved from the database to process requests:

LogFormat "%{cased_username}e|%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-agent}i\"" combined-vhost

DBDocRoot "SELECT concat('/mnt/user_storage/webapps/', lower(username), '/escalate_privileges.py'), username as cased_username from auth_user, webapps_userdomain where auth_user.id = webapps_userdomain.user_id and webapps_userdomain.domain_name = %s" HOSTNAME

Except that the %{cased_username} variable isn't actually available until about half-way through serving the request... by which time it's too late to use it to, say, specify a custom directory for the WSGI script, or static files...

So Apache makes it hard for us to do things like per-user configuration for where to serve static files from.

Night of the living Apache workers

Here's another bit of fun. Apache would often leave zombie processes lying around, which would start to eat up resources on the server. So, we had a daily "cleanup" job that looked like this:

# Kill all user-owned processes that are children of init (i.e. not a child of apache or its workers)
python -c "import psutil; [p.kill() for p in psutil.process_iter() if p.gids.real == 60000 and p.parent.pid==1]"
# Kill serve_wsgis that are owned by root and are not children of apache
python -c "import psutil; [p.kill() for p in psutil.process_iter() if 'serve_wsgi' in ' '.join(p.cmdline) and p.parent.pid==1]"

Lovely.

So our Apache configuration is torturous and hard to understand. It needs a database hit for every request. It leaves lots of zombies lying around. And it's preventing us from doing some things we want to do. So, what is the new shiny that will solve ALL OUR PROBLEMS?

Nginx + uWSGI

Muchmuchmuch simpler config

Remember the Apache config file from earlier? Here's the equivalent for nginx+uwsgi:

Nginx config:

server {
    listen 80;
    server_name ~(?<domain>.+)$;
    location / {
        root /var/log/nginx/user_logs;
        access_log /var/log/nginx/user_logs/$domain/$domain.access.log;
        include uwsgi_params;
        uwsgi_pass unix:/var/www/socket/$domain/socket;
    }
}

uWSGI config:

[uwsgi]
uid = {{ uid }}
gid = {{ gid }}
chroot = /mnt/chroots/{{ user.username }}

touch-reload = /var/www/wsgi.py
socket = /var/www/socket/{{ host }}/socket
chdir = /var/www

file = /bin/user_wsgi_wrapper.py

Just a tad simpler, I'm sure you'll agree?

Per-user config and graceful reloads

uWSGI's emperor+vassals setup means that any config files in a specified directory are dynamically loaded with no need for a restart. Adding a new site is as simple as dropping in a new file, and modifying an existing config takes effect as soon as the file is updated.

And, since it's no longer a single one-size-fits-all config file, each user / domain has their own config file so it's much easier to customise different settings for different users.

And all this with no database hit

Static files

In Apache, if we'd wanted static files, we'd have to look at something like this:

RewriteCond /mnt/user_storage/homedirs/%{ENV:subdomain}/var/www/static%{REQUEST_URI} -f
RewriteRule ^/(.*) /mnt/user_storage/homedirs/%{ENV:subdomain}/var/www/static/$1 [L]
<DirectoryMatch /mnt/user_storage/homedirs/(.*)/var/www/static/>
    Order Allow,Deny
    Allow from All
    Options SymLinksIfOWnerMatch
</DirectoryMatch>
<Directory /mnt/user_storage/homedirs>
    Options SymLinksIfOWnerMatch
</Directory>

In our new setup, it's a one-line addition to the uwsgi vassal config:

check-static=/var/www/static

The world of pain I've glossed over

If that all seemed to good to be true, it's because it is. Remember what I said about Apache loading workers dynamically, and uwsgi wanting to load them all up-front? That was a big no-no for us, since each web server may have hundreds of users on it, and loading a worker or multiple workers for each one would have made the uwsgi spin-up time way too slow. So, here's the hack:

server {

    listen      80;
    server_name  ~(?<domain>.+)$;
    location / {
        root /var/log/nginx/user_logs;
        access_log /var/log/nginx/user_logs/$domain/$domain.access.log;
        error_page 502 = @fallback;
        include uwsgi_params;
        uwsgi_pass unix:/var/www/socket/$domain/socket;

    }
    location @fallback {
        proxy_pass https://www.pythonanywhere.com/initialize_webapp/$scheme/$domain/$uri?$query_string;
    }
}

Essentially, if there's no uwsgi worker for a particular domain, nginx will consider that a "502 bad gateway" error, but our fallback tells it to go and call an initialize API on our web server.

The handler for the initialize_webapp call looks a bit like this (in pseudocode):

find user from domain
if domain not recognised: return 404
build chroot for user, including personal file storage
write user vassal file to /etc/uwsgi/vassals
wait for vassal worker to spin up
redirect back to users' webapp

Aside from that, logging is still a bit of a pain, but about half as much of a pain as it was under Apache. Nginx gives us access logs as a one-line config, which is definitely a win (we get rid of spitter.py), but error logs still need a custom WSGI wrapper -- but that's no more complicated than the old serve_wsgi.py. For the curious, you can find it in your PythonAnywhere sandbox at /bin/user_wsgi_wrapper.py

Preliminary benchmarking results

On apache, we were looking at median times (in milliseconds) of:

pythonanywhere.com Connect: 391 Processing: 139 Waiting: 136 Total: 542

user webapp: Connect: 365 Processing: 532 Waiting: 427 Total: 921

For a standard static site hosted in the same amazon zone, you'd expect about 100/100/100 for a total time of 200.

With nginx+uwsgi, we got

pythonanywhere.com Connect: 346 Processing: 227 Waiting: 226 Total: 580

user webapp: Connect: 366 Processing: 809 Waiting: 684 Total: 1175

So no wonderful news there - in fact, it's a 10% deterioration on the main site, and a 20% deterioration for users web apps. BUT - that's for the "median" request. What we did find was that it really has improved things for the slower requests - in other words, it's really improved the consistency of performance:

Apache - user webapp tests:

Time taken for tests:   191.538 seconds
Percentage of the requests served within a certain time (ms)
  50%    921
  66%   1799
  75%   1918
  80%   2746
  90%   3853
  95%   5950
  98%   9989
  99%  11979
 100%  17177 (longest request)

Nginx/uWSGI -

Time taken for tests:   61.882 seconds
Percentage of the requests served within a certain time (ms)
  50%    580
  66%    602
  75%    618
  80%    630
  90%    712
  95%    908
  98%   1067
  99%   1151
 100%   2473 (longest request)

So, under Apache, the slowest 20% of requests were taking between 3 and 10 times as long as the median request. Under nginx, the slowest 20% of requests are not that much slower than the median. So that's definitely a win - the site feels more reliable and consistent to our users.

hjwp/nginx-uwsgi-upgrade-post.md