Riak with Docker: Brain Dump

Note

For the actual food for mind see https://gist.github.com/sa2ajj/5323326#file-summary-rst

This document _tries_ to outline important items that need to be covered in order to get riak running with docker.

Please note that this is an outline of what I'm trying to do, not a step-by-step instruction (though it might become one day).

(It's possible that this document will end up somewhere else, but for now it just lives here @ gist.github.com)

This document is vaguely based on the excellent documentation offered by basho.

Overview

Pre-requisites
Initial Setup
Prepare the initial image
Perform common configuration
- /etc/riak/app.config
- /etc/riak/vm.args
Perform node specific configuration
- /etc/riak/app.config
- /etc/riak/vm.args
Setup the actual cluster
Normal Operation
Perform a node upgrade

Pre-requisites

docker offers a different approach for using linux containers.

The main difference is docker's container does not have to have a full installation of a guest os: you may install only as little as necessary.

Docker is being very actively developed, so grab the best version from github:

$ git clone git://github.com/dotcloud/docker.git

Initial Setup

Please bear in mind that this might be a suboptimal setup.

The goal is to create a riak cluster with 5 nodes (as the minimum number of nodes recommended for riak).

Prepare the host directory structure:

$ mkdir ~/riak-cluster
$ cd ~/riak-cluster
$ for i in seq 1 5; do mkdir -p node$i/{etc,data,log}; done

Things to check:

directory ownership
host and container user "mapping"(?)

Prepare the initial image

One way to start is to use base image available at the docker's registry. (Please note that at least for now, registry server does not offer any fancy UI.)

Warning

The command below will actually fail as riak packages are not available from standard repositories.

Next version of the document will address it properly.

$ docker pull base
$ C_ID=`docker run base apt-get install riak`
$ docker commit -m 'added riak package' $C_ID my/riak-base

Riak has two configurations files: /etc/riak/app.config and /etc/riak/vm.args. Both files have parameters that you'd probably like to share among all nodes, as well as node specific ones. (Detailed information about available configuration parameters can be found at http://docs.basho.com/riak/latest/references/Configuration-Files/.)

The way how you perform the actual configuration is not covered here (for now), for example, you have a magic script that magically appeared in your image called riak-magic that does all the configuration for you. After you run it, create a new image:

$ C_ID1=`docker run my/riak-base /usr/sbin/riak-magic`
$ docker commit -m 'common configuration is applied' $C_ID1 my/riak-configured

At this point, I'd like to extract the configuration files to the host (as I do not really know how to maintain them otherwise):

TODO

And place the files in the host hierarchy:

$ for i in seq 1 5; do cp /tmp/app.config /tmp/vm.args node$i/etc/; done

BIG QUESTION: is it necessary to be done inside container??

`/etc/riak/app.config`

Important (for this use case) parameters are various directories where riak stores data.

Most notable are /var/lib/riak and /var/log/riak. These do not have to be changed (less changes, easier to maintain).

The other important parameter is the IP address. You can make riak listen for connections coming from anywhere (which is not a problem if you run it in a dedicated network): use 0.0.0.0 as the IP address for various service:

http(s) interface ({riak_core, http | https})
protobuffer api interface ({riak_api, pb_ip})

`/etc/riak/vm.args`

Erlang VM allows to establish communication between nodes provided those nodes have a common cookie set up for them, hence the -setcookie parameter is the most important common one.

Perform node specific configuration

`/etc/riak/app.config`

If you put 0.0.0.0 as an address to accept connections to, nothing needs to be done at this step.

`/etc/riak/vm.args`

-name parameter specifies the node's name. It mentions node's IP address, so if each node has own IP address only this part can be modified. If some nodes share the same IP address, then the name part (before @) must be modified as well.

So modify the extracted vm.args for each node in node<I>/etc/vm.args.

Setup the actual cluster

Start the first node:

$ NODE1=`docker run -volume rw:/var/lib/riak=$(PWD)/riak-cluster/node1/data \
                    -volume rw:/var/log/riak=$(PWD)/riak-cluster/node1/log \
                    -d my/riak-configured ...`

Good question: how do I get the container's IP address.

Another good question: do I really need that address? Maybe I could resort to locally resolvable FQDN? (In this case, how docker would handle this??)

For each other node:

Start the container:
```
$ NODEX=`docker run ...`
```

Add the node to the cluster:

$ riak-admin cluster join riak@first-node-ip-address

After all nodes are added, review and commit your changes to the cluster:

$ riak-admin cluster plan
$ riak-admin cluster commit

Now it should be set...

Normal Operation

Just run the thing:

$ docker run -d ...

The important bit is that we need to retain certain things between runs:

IP address
content of /var/lib/riak (or other location that was specified to store riak's data)

Perform a node upgrade

Nothing special:

Stop the node
Upgrade my/riak-configured
If necessary, update common configuration
Start the node

Riak on Docker: Use Case

Having dumped all the information flowing in my mind related to the use case in the braindump.rst file, and having thought about what's written, here's the summary (I hope) of what's important for the use case.

The assumption is that there are two kinds of things:

those that all riak containers must share
those that are that are unique and once configured/established must live thereafter

Shared things:

riak system itself
riak subsystem configuration (like backend configuration, ring creation size)
any riak based application (e.g. map/reduce functions that are part of that application)
erlang cookie (otherwise, riak nodes won't be able to talk to each other)

Note

Unique things that live forever (i.e. between container's run or between 'run' in run+commit sequence):

riak node id (see Node ID for more concerns)

content of /var/lib/riak directory (ring state, storage content)

It seems that the unique things would require some sort of persistent storage (gh#111)

Life Cycle

Riak cluster would have the following life cycle elements:

prepare initial image
this image would have:
- riak installed
- app related components installed (e.g. map/reduce functions)
common configuration would include (in order of importance):
- ring creation size
- erlang cookie
- app related paths configured for riak
- storage backend configuration
More information about riak configuration is available at Configuration Files
actually create cluster
for each node:
- configure node id
- if it's not the first node, perform riak cluster join <first-node-id>
finally:
- review the cluster configuration (riak cluster plan)
- commit the changes (riak cluster commit)
Important: see note above
normal run

if a node dies, restart it (automatically would be preferred)

if necessary, stop node, start node
riak upgrade/common riak configuration changes
for each node:
- stop node
- perform the upgrade
- start node
in some cases, it should be enclosed in (riak cluster leave + riak cluster commit and riak cluster join + riak cluster commit)

Important: see note above
app related components are updated

pretty much the same as the previous element, except leaving/joining most likely is not required

Other notes

Network Configuration

Network Security and Firewall Configurations discusses standard configurations and port settings to use when thinking about how to secure a Riak Cluster.

(Based on the IRC discussion):

it would be a good idea to have support for cross-host shared network (@unclejack)
it might also be a good idea to be able to pick the bridge to put the container on at runtime (@unclejack)

Node ID

There are two ways to specify it: name@ip and [email protected]

In the first case, ip address must accompany that node throughout its life.

In the second case, there should be a way to always resolve that f.q.d.n to the node's current ip address.

Neither seem to be possible at the moment (a RFC is at moby/moby#353)

sa2ajj/braindump.rst

Riak with Docker: Brain Dump

Pre-requisites

Initial Setup

Prepare the initial image

Perform common configuration

`/etc/riak/app.config`

`/etc/riak/vm.args`

Perform node specific configuration

`/etc/riak/app.config`

`/etc/riak/vm.args`

Setup the actual cluster

Normal Operation

Perform a node upgrade

Riak on Docker: Use Case

Life Cycle

Other notes

Network Configuration

Node ID

thijsterlouw commented Jul 31, 2013

Uh oh!

thijsterlouw commented Jul 31, 2013

Uh oh!