hussfelt/deploy.sh

Race-condition-free deployment with the "symlink replacement" trick

On Unix, mv is an atomic operation. This enables a well-known "symlink replacement trick" for race-condition-free website deployment, among other things. Let's create a script that encapsulates the process for general-purpose use.

Motivation

When deploying an update to a website, if we do something like

git pull within our deployment directory, or
rsync to our deployment directory, or even
run a script that quickly replaces the deployment directory with a new one,

... then there can be some number of milliseconds where the files we are trying to serve are nonexistent or in a state of change.

There is a filesystem-based mechanism that enables one to deploy updates with zero risk of this race condition occurring.¹ It relies on the fact that mv is an atomic operation and Unix supports symlinks. The basic idea is that we specify our document root as a symlink to a directory containing the current version.

In the following examples, our document root is www. First, we copy all the files for our website into www.A:

$ rsync -CrP 'remote:~/website/' 'www.A/'

Then, we create the symlink, pointing the document root to www.A.

$ ln -s www.A www

Here's the directory structure in its entirety so far:

$ tree -AF --noreport
.
├── www -> www.A/
└── www.A/
    └── index.html

When it's time to deploy an update, we prepare the next version in a different directory. We copy the updated files into www.B:

$ rsync -CrP 'remote:~/website/' 'www.B/'

Giving:

$ tree -AF --noreport
.
├── www -> www.A/
├── www.A/
│   └── index.html
└── www.B/
    ├── blah.html
    └── index.html

Then, to deploy, we replace the www symlink currently pointing to www.A with one pointing to the new version, www.B.

$ ln -s www.B www.new
$ mv -T www.new www
$ tree -AF
.
├── www -> www.B/
├── www.A/
│   └── index.html
└── www.B/
    ├── blah.html
    └── index.html

You can verify that this is atomic:

$ inotifywait -m . &
[1] 14628
Setting up watches.
Watches established.
$ ln -s www.B www.new
./ CREATE www.new
$ mv -T www.new www
./ MOVED_FROM www.new
./ MOVED_TO www

Whereas something like just asking 'ln' to overwrite the existing symlink is not:

$ ln -sfn www.B www
./ DELETE www
./ CREATE www

There is an unlikely but possible moment between that DELETE and the subsequent CREATE where a webserver might attempt to serve a file and find that its directory is missing!

Rollback and staging for free

One of the interesting things about this technique is that if you find shortly after deploy that your updates are broken, you can switch the symlink back to the previous version. With no extra effort you've gained the ability to do deployment rollbacks.

Another is that if you instruct some webserver to use the "next" directory www.B as its document root, you gain a staging or preview site where you can inspect your changes before they "go live."

Preparing the stage

If we can assume that the new versions will be mostly the same as the previous versions, we can take advantage of a bandwidth-saving feature of rsync. Cloning the "live" site into the "stage" site before using rsync to transfer updated files will result in a bandwidth reduction, as rsync skips unmodified parts of files.

If we do this just before transferring files, we encounter a mildly complex multiple-connection process:

Create staging area with clone on host
Push all files up from development boxes
Perform symlink switch

However if we are willing to let the 'stage' directory persist on disk, we can re-use it and prepare it ahead of time, after each deploy.

Perform symlink switch
Prepare next staging area
Development boxes copy files into staging area at their leisure

An abstraction

Let's encapsulate this technique into a set of scripts so we don't have to remember, say, what the -T option to mv is and why it's needed.

#!/bin/sh
# deploy.sh
N="`readlink \"$1.prev\"`"
cp -PT "$1" "$1.prev"
mv -T "$1.stage" "$1"
ln -s "$N" "$1.stage"
rm -rf "$N"
cp -aH "$1" "$N"

Next we provide a script to perform rollbacks.

#!/bin/sh
# rollback.sh
[ ! -e "$1.prev" ] && echo "Can't roll back." && exit 1
N="`readlink \"$1.stage\"`"
cp -PT "$1" "$1.stage"
mv -T "$1.prev" "$1"
ln -s "$N" "$1.prev"
rm -rf "$N"

Finally, we can also automate the initial setup of a set of directories and symlinks that this process expects to work with. Note that this initial setup step can't take advantage of the 'symlink replacement' trick.

#!/bin/sh
# initialize-deployable.sh
mkdir "$1.d"
mv "$1" "$1.d/1"
ln -s "$1.d/1" "$1"
cp -aH "$1" "$1.d/2"
ln -s "$1.d/2" "$1.stage"
ln -s "$1.d/3" "$1.prev"

Demo

Let's create an example site.

$ mkdir www
$ echo First > www/index.html
$ tree -AF --noreport
.
└── www/
    └── index.html
$ cat www/index.html 
First

Now, let's set it up to be usable with our deployment script. Note that this step does not employ the symlink switch trick.

$ initialize-deployable.sh www
$ ls -F
www@  www.d/  www.prev@  www.stage@

The contents of the www directory remain the same:

$ cat www/index.html 
First

The contents of the stage are currently the same as the live:

$ cat www.stage/index.html 
First

And as expected there's nothing to roll back to:

$ rollback.sh www
Can't roll back.

Let's create a new version:

$ echo Second > www.stage/index.html
$ echo foo > www.stage/foo.html
$ ls -F www/
index.html
$ ls -F www.stage/
foo.html  index.html

Now, let's deploy it.

$ deploy.sh www

The code deployed as expected:

$ ls -F www/
foo.html  index.html
$ cat www/index.html 
Second

And the .prev link works now:

$ ls -F www.prev/
index.html
$ cat www.prev/index.html 
First

Let's do another!

$ echo Therd > www.stage/index.html 
$ deploy.sh www
$ cat www/index.html 
Therd

Oops! I did something wrong. Let's try a rollback:

$ rollback.sh www
$ cat www/index.html 
Second

Pfwhew! How about another? We shouldn't be able to.

$ rollback.sh www
Can't roll back.

Let's see the state of the three directories.

$ cat www.prev/index.html
cat: www.prev/index.html: No such file or directory
$ cat www/index.html
Second
$ cat www.stage/index.html
Therd

As you can see the changes made in the stage are still there. Adjusting this to reset the stage to the contents of the live is trivial. Finally let's fix the problem and deploy again.

$ echo Third > www.stage/index.html 
$ deploy.sh www
$ cat www/index.html 
Third

Finally let's look at the entire directory structure as it exists now.

$ tree -AF --noreport
.
├── www -> www.d/3/
├── www.d/
│   ├── 1/
│   │   ├── foo.html
│   │   └── index.html
│   ├── 2/
│   │   ├── foo.html
│   │   └── index.html
│   └── 3/
│       ├── foo.html
│       └── index.html
├── www.prev -> www.d/2/
└── www.stage -> www.d/1/

The directory www.d contains three arbitrarily-named directories to hold the previous, current, and next versions as needed. The current directory contains three symlinks, www, www.prev, and www.stage.

With trivial changes to the management scripts these names could be modified to taste, or the three version directories need not live in their own containing directory. For example, changing '.d/1' to '.A', '.d/2' to '.B' and '.d/3' to '.C' in the initialization script is all that is needed to produce a filesystem layout like this instead:

$ tree -AF --noreport
.
├── www -> www.C/
├── www.A/
│   ├── foo.html
│   └── index.html
├── www.B/
│   ├── foo.html
│   └── index.html
├── www.C/
│   ├── foo.html
│   └── index.html
├── www.prev -> www.B/
└── www.stage -> www.A/

A simplified abstraction

This version eschews the rollback feature. Since I keep my files in version control, as should you, I can let this deployment mechanism stay ignorant of how a "rollback" differs from just another deploy. Attempting to eschew the staging directory as well makes the script more complex so we'll just keep that.

#!/bin/sh
# deploy.sh
N="`readlink \"$1\"`"
mv -T "$1.stage" "$1"
ln -s "$N" "$1.stage"
rm -rf "$N"
cp -aH "$1" "$N"

And the setup script:

#!/bin/sh
# initialize-deployable.sh
mkdir "$1.d"
mv "$1" "$1.d/1"
ln -s "$1.d/1" "$1"
cp -aH "$1" "$1.d/2"
ln -s "$1.d/2" "$1.stage"

Demo

Let's create an example site.

$ mkdir www
$ echo First > www/index.html
$ tree -AF --noreport
.
└── www/
    └── index.html
$ cat www/index.html 
First

Now, just like last time, let's set it up to be usable with our deployment script. Note that we have no www.prev symlink.

$ initialize-deployable.sh www
$ ls -F
www@  www.d/  www.stage@

The contents of the www directory remain the same:

$ cat www/index.html 
First

The contents of the stage are currently the same as the live:

$ cat www.stage/index.html 
First

Let's create a new version:

$ echo Second > www.stage/index.html
$ echo foo > www.stage/foo.html
$ ls -F www/
index.html
$ ls -F www.stage/
foo.html  index.html

Now, let's deploy it.

$ deploy.sh www

The code deployed as expected:

$ ls -F www/
foo.html  index.html
$ cat www/index.html 
Second

Finally let's look at the entire directory structure as it exists now.

$ tree -AF --noreport
.
├── www -> www.d/2/
├── www.d/
│   ├── 1/
│   │   ├── foo.html
│   │   └── index.html
│   └── 2/
│       ├── foo.html
│       └── index.html
└── www.stage -> www.d/1/

A generalized abstraction

We have seen that this technique employs two directories, but with three directories we gain one level of rollback.

With 4 directories, could be have 2 levels of rollback? Can we write a script that works for N directories and achieves N-2 levels of rollback?

This is an interesting question from an abstraction and reduction perspective, that I may explore at some point but I don't really have a practical use for more than one rollback directory. (In fact I prefer none for my own work.)

A unified tool

Rather than have multiple scripts on our $PATH, let's bake them together and add some sanity-checking, argument parsing, and error handling for a more robust tool.

FIXME TODO: ...

There are other means of avoiding this problem. Reconfiguring your webserver to serve from the new directory, and then sending them a signal to begin using the updated configuration would also work. In this document we only consider the case where we are deploying filesystem updates and we can't restart our webserver. Like when deploying a static website on a cheap commodity web host. ↩

	#!/bin/sh
	# deploy.sh
	N="`readlink \"$1\"`"
	mv -T "$1.stage" "$1"
	ln -s "$N" "$1.stage"
	rm -rf "$N"
	cp -aH "$1" "$N"

	#!/bin/sh
	# deploy.sh
	N="`readlink \"$1.prev\"`"
	cp -PT "$1" "$1.prev"
	mv -T "$1.stage" "$1"
	ln -s "$N" "$1.stage"
	rm -rf "$N"
	cp -aH "$1" "$N"

	#!/bin/sh
	# initialize-deployable.sh
	mkdir "$1.d"
	mv "$1" "$1.d/1"
	ln -s "$1.d/1" "$1"
	cp -aH "$1" "$1.d/2"
	ln -s "$1.d/2" "$1.stage"

	#!/bin/sh
	# rollback.sh
	[ ! -e "$1.prev" ] && echo "Can't roll back." && exit 1
	N="`readlink \"$1.stage\"`"
	cp -PT "$1" "$1.stage"
	mv -T "$1.prev" "$1"
	ln -s "$N" "$1.prev"
	rm -rf "$N"
	# Keep current as stage on rollback? If no, uncomment:
	# mkdir "`readlink \"$1.stg\"`"
	# cp -alH "$1" "$1.stg"

hussfelt/deploy.sh

Race-condition-free deployment with the "symlink replacement" trick

Motivation

Rollback and staging for free

Preparing the stage

An abstraction

Demo

A simplified abstraction

Demo

A generalized abstraction

A unified tool

Footnotes