The introduction of our Unicorn management tool, Polycorn.
Photo by Protohiro from Flickr
At Fotopedia, we use Unicorn to serve our main Rails application. Every day, we restart our application several times, spawning and killing hundred of Unicorns. Managing graceful restarts is a complex task, and requires careful monitoring and command. This article introduces our tool Polycorn, a Unicorn management program.
[[MORE]]
Unicorn is a ruby Rack HTTP server designed for fast client and Unix. With simplicity in mind, it aims at providing a confortable and powerful house for your application while remaining easy to integrate in your application stack.
At Fotopedia, we started by using ModPassenger quite early during the initial development but we had several problems with it. We regularly managed to crash/hang some workers, depleting the number of available workers slowly, but surely, and leaving the application unresponsive, until we restarted the whole Apache. At that time, our code contained several memory leaks (gracefully provided by rmagick) and probably other issues but the whole was very much unstable and we stopped using ModPassenger around the 20th of June 2011.
At that time, Unicorn was quite young but already promising and we quickly decided to use this piece of code in our infrastructure: It would fit nicely behing our Nginx and it was quite easy to add some code to the stack to kill long running processes.
This way, if a worker had not taken any request in the last 60s, we would know for sure it was stuck and we could safely send it ad patres and the master process would start another worker quickly.
Deployments. Often.
Because the dev team at Fotopedia is very agile, anybody can work on almost any part of the code, commit a fix and ask for a deploy at anytime (including Fridays at 5pm, yes, and even on week-ends). As a result, we sometimes have to deploy things very quickly. Sometimes because we want to iterate on a new feature, other times (but this is much less frequent), because something we just deployed broke hell on us. As a result, we deploy between 0 and 20 times a day. There is no rule.
When you update the application code, there are many ways to notify Unicorn of that update. The simplest way of all is to just restart the daemon. Provided your stack is fault-tolerant, your new incoming requests will end up on another backend and will be served anyway (after a variable but uncompressible delay if your stack is not actively switching off your backend from the stack). This is a rather extreme way of life and we never actually did like that.
Moreover, Unicorn supports various signals that indicate how it should behave and we started by sending HUP after a deploy. HUP "reloads" the whole unicorn, including its configuration, and restarts all the workers. At that time, we had a small number of backends to serve all the requests (well, this is Ruby) and restarting a whole Unicorn (which happened to run 8 workers) was cutting our capacity a lot. Of course, the restarts were phased out slowly at 60s intervals to ensure the newly started Unicorn had picked up where the old one left but still, there was a some lag and the whole stack was slower for a few minutes.
Not cool.
In the Unicorn documentation, there is a mention of USR2 signal that re-executes the running binary. This feature is very useful. It spawns a new Unicorn master, which in turn spawns children. The whole stuff boots up your application and starts serving requests as soon as possible. At that point, your application is running two versions of the software at the same time and things are getting more interesting...
However, you must be careful: If your application is broken or some dependencies are missing, the workers are likely to die very quickly. So you have to pay attention to them, and ensure they are "stable" and not dying too much. Especially as the Unicorn master process will try to have n running workers everytime, so a quick glance at the process list will not show you something is wrong immediately.
Also, once the situation is stabilised, you can kill the old master and it will lease the socket opened for the new master, your new code is now online. Hurray !
Polycorn was born from our need to automate these restarts without giving up on the overall monitoring we want. Polycorn is a Unicorn manager (hence its very original name). It serves 4 goals:
- Start, Stop and Reload Unicorn gracefully
- Detect always-dying Unicorns
- Detect leaking Unicorns
- Notify the outside world about what is going on
Because we wanted a robust design without too much overhead, Polycorn dies a lot. Everytime its internal state does change, it will commit suicide and is expected to restart. We use runit to control Polycorn runs and runit will always ensure that an instance of Polycorn is running.
As we work with Unicorn, we do not need to have any internal state (except for short transitive states):
- Unicorn maintains the pid of the main Unicorn master
- It also maintains the pid of the new Unicorn master, when transitioning via USR2
- The process table can be inspected for the Unicorn masters children
Get it from https://gist.github.com/octplane/7039960.
Polycorn has no dependency outside sys-proctable, a little library that does some process table introspection. The main script is the only thing you will need.
Polycorn invocation consists of 2 parameters:
polycorn /path/to/unicorn/pids/folder "unicorn path and options"
So a simple invocation will look like:
/usr/local/bin/polycorn /path/to/unicorn/pids/folder unicorn -E production /ftn/apps/our/testing/current/config.ru -D
While a more complex use case could be (this is more or less what we use at Fotopedia):
exec /usr/local/bin/polycorn /ftn/apps/our/shared/pids "export RBENV_ROOT=/opt/rbenv; export PATH=/
opt/rbenv/shims:/opt/rbenv/bin:/usr/local/bin/bin:/usr/local/bin:/usr/bin:/bin; unset GEM_HOME; unset
GEM_PATH; export RUBYOPT=W0; unset RBENV_DIR; unset BUNDLE_BIN_PATH; unset RBENV_HOOK_PATH; unset BU
NDLE_GEMFILE; unset RBENV_VERSION; cd /tmp/; RBENV_VERSION=$(cat /ftn/apps/our/testing/current/.rbe
nv-version) BUNDLE_GEMFILE=/ftn/apps/our/testing/current/Gemfile chpst -u apps:apps bundle exec uni
corn -E development -c /etc/our.unicorn.conf.rb /ftn/apps/our/testing/current/config.ru -D"
Depending on your needs, you might want to customize Polycorn a bit further . If you wish to, you can create a file /etc/polycorn.conf.rb
and dump some more configuration.
The default configuration is:
# Maximum time to wait before declaring an emergency in Polycorn state processing
@max_wait = 60
# Maximum RSS a Unicorn can use before being considered as too fat.
@max_rss = 1_400_000_000
# Called when something has to be told to the outer world
def alert(message)
end
The configuration file can also be used if you use Polycorn in a Bundler
environment, or if you require some other library for your alert processing.
For example, our configuration file looks like this:
# Generated by Chef
ENV['BUNDLE_GEMFILE']='/etc/bundler.chef/Gemfile'
require 'rubygems'
require 'bundler/setup'
require 'fwissr'
GRID=Fwissr['/grid']
CHANNEL = case GRID
when 'prod'
"#unicorn"
else 'testing'
"#unicorn-testing"
end
def alert(message)
# log message
irc_report(CHANNEL, message)
end
@max_wait = 60
@max_rss = 1_400_000_000
Polycorn was written by Oct, our Server Architect. It was written a long time ago in our Chef repository. It's a Ruby script.
Your unicorns must be super-fast! I had to introduce sleeps as a hacky way to get polycorn to allow my worker threads to start, otherwise on the second start (from runit) it said that "All unicorn have died, killing master." Am I doin' it wrong?