How to scale out a Phoenix application with websocket

Foreword

It's relatively easy to scale out stateless web applications. You often only need a reverse proxy. But for those stateful web applications, especially those applications that embeds websocket services in them, it's always a pain to distribute them in a cluster. The traditional way is introducing some external services like Redis to handle pubsub, however, in such way, you often need to change your code. Can Erlang/Elixir, the "concurrency oriented programming languages", best other languages in this use case? Has Phoenix framework already integrated the solution of horizontally scaling websocket? I'll do an experiment to prove (or disprove) that.

Resources

You can download the source code from https://gitee.com/aetherus/gossipy.

Goal

Distribute a Phoenix chatroom application (websocket only) across multiple nodes in a cluster, without introducing extra services (e.g. Redis, RabbitMQ, etc.), without changing a bit of the source code of the project. The only thing allowed is adding/modifying config files.

Tools

Ubuntu 16.04 or its derivative distro (I use Elementary Loki)
Docker (I have only 1 PC)
Docker compose (The same reason as above)
Elixir SDK
A backbone Phoenix chatroom application without database, ecto, assets

Install the release tool

I'll use Distillery as the release tool.

Just add the following line into mix.exs and run mix deps.get to fetch that.

defp deps do
  [
    {:distillery, "~> 1.5", runtime: false}   #<--- This line
  ]
end

The runtime: false here means distillery is not a dependency of your web application. It's just used in releasing procedure.

Configure the release

First, let distillery generate some config files for us.

$ mix release.init

You'll find a new directory rel at the root of your application. Inside it there is a config.exs and an empty plugins directory. We'll ignore the rel/plugins for the whole experiment. Let's have a look at the content of rel/config.exs now.

Path.join(["rel", "plugins", "*.exs"])
|> Path.wildcard()
|> Enum.map(&Code.eval_file(&1))

use Mix.Releases.Config,
    default_release: :default,
    default_environment: Mix.env()

environment :dev do
  set dev_mode: true
  set include_erts: false
  set cookie: :"<&9.`Eg/{6}.dwYyDOj>R6R]2IAK;5*~%JN(bKuIVEkr^0>jH;_iBy27k)4J1z=m"
end

environment :prod do
  set include_erts: true
  set include_src: false
  set cookie: :">S>1F/:xp$A~o[7UFp[@MgYVHJlShbJ.=~lI426<9VA,&RKs<RyUH8&kCn;F}zTQ"
end

release :gossipy do
  set version: current_version(:gossipy)
  set applications: [
    :runtime_tools
  ]
end

You can see 2 environments there, :dev and :prod. We'll focus on :prod because releasing :dev doesn't make any sense to me. It's worth noting that the environments in rel/config.exs are NOT MIX_ENV!! They are just names of building strategies. You can see the cookie there, which will be used in the -setcookie option of starting Erlang virtual machine. The option :include_erts tells distillery whether to embed the whole ERTS (Elang RunTime System) into the release package. If set to true, then you don't have to install Erlang or Elixir on your target nodes in the cluster, but you have to make sure that the nodes run the same operating system as the host building the release.

By default, this configuration builds releases with startup option -name [email protected], which is not good for docker deployment because it's difficult to get the IP addresses of a running docker container. We need to use sname instead. For this purpose, I created rel/vm.args, with the content below:

## Name of the node
# -name [email protected]
-sname gossipy

## Cookie for distributed erlang
-setcookie >S>1F/:xp$A~o[7UFp[@MgYVHJlShbJ.=~lI426<9VA,&RKs<RyUH8&kCn;F}zTQ

## Heartbeat management; auto-restarts VM if it dies or becomes unresponsive
## (Disabled by default..use with caution!)
##-heart

## Enable kernel poll and a few async threads
##+K true
##+A 5

## Increase number of concurrent ports/sockets
##-env ERL_MAX_PORTS 4096
  
## Tweak GC to run more often
##-env ERL_FULLSWEEP_AFTER 10  
  
# Enable SMP automatically based on availability
-smp auto

This is an exact copy of the default vm.args file distillery generates in the release, with the only modification that removed -name option and added -sname.

Next step is telling distillery to use my modified vm.args.

environment :prod do
  ...
  set vm_args: "rel/vm.args"  # <---- add this line
end

According to distillery's official documentation, I also need to add the following lines in config/prod.exs:

config :gossipy, GossipyWeb.Endpoint,
  ...
  check_origin: false,
  server: true,
  root: ".",
  version: Application.spec(:gossipy, :vsn)

check_origin: false is ONLY for convenience of this experiment. Never do this in your production as it decreases security!
server: true means the release will be booted through Cowboy.
root: "." sets the root path of static assets, which, in this experiment, there's none.
version sets the version of the release. Application.spec(:gossipy, :vsn) fetches the value from mix.exs.

Further more, we need to make the nodes know each other. We list all the nodes in the config/prod.exs:

config :kernel,
  sync_nodes_optional: [:"gossipy@ws1", :"gossipy@ws2"],
  sync_nodes_timeout: 10_000  # milliseconds

sync_nodes_optional means if a node cannot be reached in the amount of time specified in sync_nodes_timeout, that node will be ignored. There is another option, sync_nodes_mandatory, which will crash the current node if one of the nodes listed is unavailable. We obviously don't want this behavior.

Build the release

$ MIX_ENV=prod mix release --env=prod

Then we enter the deployment phase.

Create a docker image

Create a Dockerfile with the following content:

FROM ubuntu:xenial

EXPOSE 4000

ENV PORT=4000

RUN mkdir -p /www/gossipy && \
    apt-get update && \
    apt-get install -y libssl-dev

ADD ./_build/prod/rel/gossipy/releases/0.0.1/gossipy.tar.gz /www/gossipy

WORKDIR /www/gossipy

ENTRYPOINT ./bin/gossipy foreground

As mentioned before, we have to make sure that the servers need to run the same OS as the building machine, I have to choose Ubuntu 16.04. Honestly, I should use the docker image elixir:1.6-alpine to build and to run the application, but I'm lazy.

I need to install libssl-dev because for some reason, Phoenix need the file crypto.so to run, maybe it's because of the support of HTTPS and WSS.

I'm lazy, so I created docker-compose.yml to save my keyboard strokes.

version: '3.2'

services:
  ws1:
    build: .
    hostname: ws1
    ports:
      - 4001:4000

  ws2:
    build: .
    hostname: ws2
    ports:
      - 4002:4000

There are 2 containers. Docker NAT's the port 4001 of the host to the port 4000 of the first container, and 4002 to the one of the second. We do this because we need to be sure each websocket connection connects to which container. We don't do this in production though.

When all the configuration job is done, it's time to start the cluster!

$ docker-compose up

Then you can try connecting to each container, and see if the connections communicates. If you don't know how to connect, see APPENDIX 1.

Extra experiment 1. Killing a node

Kill the ws2 container.

$ docker-compose kill ws2

All the connections to ws2 are lost, but the connections to ws1 are still alive. The connection loss is not an issue because in production, we use the same origin (schema + domain name + port) for all the nodes. This means as long as the clients implement a proper reconnection mechanism, they will soon reconnect to the nodes alive.

We start ws2 again.

$ docker-compose start ws2

Everything recovers.

Extra experiment 2. Adding a node

The purpose is not adding a node, but adding a node without rebooting the current nodes.

We first add another service in the docker-compose.yml

  ws3:
    build: .                   
    hostname: ws3              
    ports:                     
      - 4003:4000

and then modify the config/prod.exs accordingly

config :kernel,
  sync_nodes_optional: [:"gossipy@ws1", :"gossipy@ws2", :"gossipy@ws3"],  #<---- Note the new ws3
  sync_nodes_timeout: 10000

Rebuild a release, then start ws3.

$ MIX_ENV=prod mix release.clean
$ MIX_ENV=prod mix release --env=prod
$ docker-compose up ws3

The newly added node instantly linked to the old nodes as expected.

Conclusion

Though tweaking the config files takes some time, we successfully scaled out the websocket application without changing a single word in our source code! And that is the beauty of Location Transparency!

APPENDIX 1. The minimal HTML used in this experiment

<!doctype html>
<html> 
  <head>
    <meta charset="utf-8">
    <title>Phoenix Channel Demo</title>     
  </head>

  <body> 
    <pre id="messages"></pre>  

    <input id="shout-content"> 
  
    <script>
      window.onload = function () {   
        var wsPort = window.location.search.match(/\bport=(\d+)\b/)[1];
        var messageBox = document.getElementById('messages');
        var ws = new WebSocket('ws://localhost:' + wsPort + '/socket/websocket');
  
        ws.onopen = function () {       
          ws.send(JSON.stringify({        
            topic: 'room:1',
            event: 'phx_join',
            payload: {},
            ref: 0
          }));
        };

        ws.onmessage = function (event) {
          var data = JSON.parse(event.data);
          if (data.event !== 'shout') return;
          messageBox.innerHTML += data.payload.message + "\n"; 
        }

        document.getElementById('shout-content').onkeyup = function (event) {
          if (event.which !== 13) return; 
          if (!event.target.value) return;
          ws.send(JSON.stringify({        
            topic: "room:1",
            event: "shout",
            payload: {message: event.target.value},
            ref: 0
          }));
          event.target.value = '';
        };
      }
    </script>
  </body>
</html>

You can host it anyway you like, just make sure that the browser is not accessing it using file://. I saved the content in a file ws.html, and served it using $ python -m SimpleHTTPServer (port defaults to 8000), then accessing it with http://localhost:8000/ws.html?port=4001. You choose which node to connect to by setting the port parameter.

Aetherus/scale-out-phoenix-with-websocket.md