Work in Progress

Recovery Procedure

Start with a running cluster (a 1-node cluster is enough) with lots of data, e.g.:

$ etcd --name 0 --listen-peer-urls=http://localhost:2380 --listen-client-urls=http://localhost:2379 -advertise-client-urls=http://localhost:2379 --initial-cluster-state=new --snapshot-count=100 --debug=true --log-package-levels "*=DEBUG"

Make sure is has created a snapshot:

$ ls -l 0.etcd/member/snap/

This directory should be there. If it is not (it will with the upper command line!), start your etcd with a low --snapshot-count, e.g. 10. Wait until a snapshot is created (might take some seconds). Then you can restart your etcd with a normal --snapshot-count setting, e.g. 10000 by default.

Make a (live-) backup of that node:

$ etcdctl backup --backup-dir backup.etcd --data-dir 0.etcd --keep-cluster-id

Generate some load, preferably even enough to create a new snapshot on node 0, e.g. with --snapshot-count=100 as above least 100 new keys (the default is 10000 for this):

$ X=0; while true; do X=$[$X + 1]; etcdctl --no-sync set --ttl 10 /$(date +"%Y-%m-%d-%H-%M-%S")/$(date +"%Y-%m-%d-%H-%M-%S-%3N") $X >/dev/null; echo $X; done

Add the new node:

$ etcdctl member add 1 http://localhost:12380

Copy the node-id and use it to restore the data from the backup snapshot:

$ etcdctl backup --backup-dir 1.etcd --data-dir backup.etcd --keep-cluster-id --node-id 368ff4e7245ee2e3

Start up the new node:

$ etcd --name 1 --initial-cluster=0=http://localhost:7001,0=http://localhost:02380,1=http://localhost:12380 --initial-advertise-peer-urls=http://localhost:12380 --listen-peer-urls=http://localhost:12380 --listen-client-urls=http://localhost:12379 -advertise-client-urls=http://localhost:12379 --initial-cluster-state=existing --debug=true --log-package-levels "*=DEBUG"

The node should join the cluster and eventually the cluster should be healthy with two nodes:

$ etcdctl cluster-health

The new node should show the data from the time after the backup has been made and before the new node joined.

$ etcdctl --peers http://localhost:12379 --no-sync ls

Testing

To see that no complete snapshot is sent, but only the missing keys since the backup, the patch at https://gist.github.com/sttts/cfdc7c89cb7bfbef6c8e7ec0c80274e7 is helpful. Here is a log with this patch: https://gist.github.com/sttts/dd81c59f013c499d76585274eb05c8e1.

Alternative, use tcpdump -i lo port 2380 and check the traffic on join. The sum of the packets' length should be clearly lower than the snapshot size (in the dimension of the diff between backup and join).

Simulating Slow Network

etcd has a timeout of 5 seconds to transfer a snapshot. The whole goal of etcd-io/etcd#5397 is to allow a recovery of a node without a complete snapshot transfer. To simulate the failure case though (for comparison with the upper procedure), you can use the netem Linux kernel feature:

If you use localhost addresses, you can limit the bandwidth of the lo device in the following way:

$ tc qdisc add dev lo root netem rate 10000kbit

To disable the bandwidth limitation, do this:

$ tc qdisc delete dev lo root netem rate 10000kbit

The value 10000kbit means that transfering a 100MB snapshot would take roughly 1,5 minutes, far above the 5 seconds timeout of etcd.

sttts/node-recovery.md

Work in Progress

Recovery Procedure

Testing

Simulating Slow Network