This is not official documentation, have/make backups, use at your own risk.
v2.1.8 only (for 2.2.x, see https://gist.github.com/superseb/f223b15949c031983da2cb850f56a897)
When etcd db size exceeds quota, it will raise an alarm and throw the error mvcc: database space exceeded
.
You can get the current status of etcd by running:
# Copy needed etcd certificates
$ docker cp $(docker ps | grep -E "rancher/rancher:|rancher/rancher " | awk '{ print $1 }'):/etc/kubernetes/ssl etcdssl
$ docker run --rm --net=container:$(docker ps | grep -E "rancher/rancher:|rancher/rancher " | awk '{ print $1 }') -v $PWD/etcdssl:/etc/kubernetes/ssl -e ETCDCTL_API=3 -e ETCDCTL_CACERT=/etc/kubernetes/ssl/kube-ca.pem -e ETCDCTL_CERT=/etc/kubernetes/ssl/kube-etcd-127-0-0-1.pem -e ETCDCTL_KEY=/etc/kubernetes/ssl/kube-etcd-127-0-0-1-key.pem rancher/rke-tools:v0.1.27 bash -c "etcdctl endpoint status --write-out=table"
+----------------+------------------+---------+---------+-----------+-----------+------------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX |
+----------------+------------------+---------+---------+-----------+-----------+------------+
| 127.0.0.1:2379 | e92d66acd89ecf29 | 3.2.13 | 2.1 GB | true | 3 | 5852 |
+----------------+------------------+---------+---------+-----------+-----------+------------+
$ docker run --rm --net=container:$(docker ps | grep -E "rancher/rancher:|rancher/rancher " | awk '{ print $1 }') -v $PWD/etcdssl:/etc/kubernetes/ssl -e ETCDCTL_API=3 -e ETCDCTL_CACERT=/etc/kubernetes/ssl/kube-ca.pem -e ETCDCTL_CERT=/etc/kubernetes/ssl/kube-etcd-127-0-0-1.pem -e ETCDCTL_KEY=/etc/kubernetes/ssl/kube-etcd-127-0-0-1-key.pem rancher/rke-tools:v0.1.27 bash -c "etcdctl alarm list"
memberID:16802198677343883049 alarm:NOSPACE
Compact and defrag:
$ rev=$(docker run --rm --net=container:$(docker ps | grep -E "rancher/rancher:|rancher/rancher " | awk '{ print $1 }') -v $PWD/etcdssl:/etc/kubernetes/ssl -e ETCDCTL_API=3 -e ETCDCTL_CACERT=/etc/kubernetes/ssl/kube-ca.pem -e ETCDCTL_CERT=/etc/kubernetes/ssl/kube-etcd-127-0-0-1.pem -e ETCDCTL_KEY=/etc/kubernetes/ssl/kube-etcd-127-0-0-1-key.pem rancher/rke-tools:v0.1.27 bash -c "etcdctl endpoint status --write-out json | egrep -o '\"revision\":[0-9]*' | egrep -o '[0-9]*'")
$ echo $rev
5456
$ docker run --rm --net=container:$(docker ps | grep -E "rancher/rancher:|rancher/rancher " | awk '{ print $1 }') -v $PWD/etcdssl:/etc/kubernetes/ssl -e ETCDCTL_API=3 -e ETCDCTL_CACERT=/etc/kubernetes/ssl/kube-ca.pem -e ETCDCTL_CERT=/etc/kubernetes/ssl/kube-etcd-127-0-0-1.pem -e ETCDCTL_KEY=/etc/kubernetes/ssl/kube-etcd-127-0-0-1-key.pem rancher/rke-tools:v0.1.27 bash -c "etcdctl compact $rev"
compacted revision 5456
$ docker run --rm --net=container:$(docker ps | grep -E "rancher/rancher:|rancher/rancher " | awk '{ print $1 }') -v $PWD/etcdssl:/etc/kubernetes/ssl -e ETCDCTL_API=3 -e ETCDCTL_CACERT=/etc/kubernetes/ssl/kube-ca.pem -e ETCDCTL_CERT=/etc/kubernetes/ssl/kube-etcd-127-0-0-1.pem -e ETCDCTL_KEY=/etc/kubernetes/ssl/kube-etcd-127-0-0-1-key.pem rancher/rke-tools:v0.1.27 bash -c "etcdctl defrag"
Finished defragmenting etcd member[127.0.0.1:2379]
$ docker run --rm --net=container:$(docker ps | grep -E "rancher/rancher:|rancher/rancher " | awk '{ print $1 }') -v $PWD/etcdssl:/etc/kubernetes/ssl -e ETCDCTL_API=3 -e ETCDCTL_CACERT=/etc/kubernetes/ssl/kube-ca.pem -e ETCDCTL_CERT=/etc/kubernetes/ssl/kube-etcd-127-0-0-1.pem -e ETCDCTL_KEY=/etc/kubernetes/ssl/kube-etcd-127-0-0-1-key.pem rancher/rke-tools:v0.1.27 bash -c "etcdctl alarm disarm"
memberID:16802198677343883049 alarm:NOSPACE
$ docker run --rm --net=container:$(docker ps | grep -E "rancher/rancher:|rancher/rancher " | awk '{ print $1 }') -v $PWD/etcdssl:/etc/kubernetes/ssl -e ETCDCTL_API=3 -e ETCDCTL_CACERT=/etc/kubernetes/ssl/kube-ca.pem -e ETCDCTL_CERT=/etc/kubernetes/ssl/kube-etcd-127-0-0-1.pem -e ETCDCTL_KEY=/etc/kubernetes/ssl/kube-etcd-127-0-0-1-key.pem rancher/rke-tools:v0.1.27 bash -c "etcdctl alarm list"
<empty>
$ docker run --rm --net=container:$(docker ps | grep -E "rancher/rancher:|rancher/rancher " | awk '{ print $1 }') -v $PWD/etcdssl:/etc/kubernetes/ssl -e ETCDCTL_API=3 -e ETCDCTL_CACERT=/etc/kubernetes/ssl/kube-ca.pem -e ETCDCTL_CERT=/etc/kubernetes/ssl/kube-etcd-127-0-0-1.pem -e ETCDCTL_KEY=/etc/kubernetes/ssl/kube-etcd-127-0-0-1-key.pem rancher/rke-tools:v0.1.27 bash -c "etcdctl endpoint status --write-out=table"
+----------------+------------------+---------+---------+-----------+-----------+------------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX |
+----------------+------------------+---------+---------+-----------+-----------+------------+
| 127.0.0.1:2379 | e92d66acd89ecf29 | 3.2.13 | 7.4 MB | true | 3 | 6114 |
+----------------+------------------+---------+---------+-----------+-----------+------------+
At this point, the rancher/rancher
container should stop logging mvcc: database space exceeded
.
In case that the rancher/rancher
won't keep running, we need external maintenance to etcd as we cannot use the rancher/rancher
container to perform maintenance.
# Copy needed etcd certificates
$ docker cp $(docker ps | grep -E "rancher/rancher:|rancher/rancher " | awk '{ print $1 }'):/etc/kubernetes/ssl etcdssl
# Stop Rancher container (and block restarting)
$ docker stop $(docker ps | grep -E "rancher/rancher:|rancher/rancher " | awk '{ print $1 }')
# Run etcd container with data dir from Rancher's embedded etcd
$ docker run -d -e ETCDCTL_API=3 --name etcd-maintenance --volumes-from=$(docker ps -a| grep -E "rancher/rancher:|rancher/rancher " | awk '{ print $1 }') quay.io/coreos/etcd:v3.2.13 /usr/local/bin/etcd --data-dir=/var/lib/rancher/etcd
# Check etcd status
$ docker exec etcd-maintenance etcdctl endpoint status --write-out=table
+----------------+------------------+---------+---------+-----------+-----------+------------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX |
+----------------+------------------+---------+---------+-----------+-----------+------------+
| 127.0.0.1:2379 | e92d66acd89ecf29 | 3.2.13 | 2.1 GB | true | 7 | 8773 |
+----------------+------------------+---------+---------+-----------+-----------+------------+
$ docker exec etcd-maintenance etcdctl alarm list
memberID:16802198677343883049 alarm:NOSPACE
# Run compact/defrag
$ rev=$(docker exec etcd-maintenance etcdctl endpoint status --write-out json | egrep -o '"revision":[0-9]*' | egrep -o '[0-9]*')
$ echo $rev
7921
$ docker exec etcd-maintenance etcdctl compact "$rev"
compacted revision 7921
$ docker exec etcd-maintenance etcdctl defrag
Finished defragmenting etcd member[127.0.0.1:2379]
# docker exec etcd-maintenance etcdctl alarm disarm
memberID:16802198677343883049 alarm:NOSPACE
$ docker exec etcd-maintenance etcdctl alarm list
<empty>
$ docker exec etcd-maintenance etcdctl endpoint status --write-out=table
+----------------+------------------+---------+---------+-----------+-----------+------------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX |
+----------------+------------------+---------+---------+-----------+-----------+------------+
| 127.0.0.1:2379 | e92d66acd89ecf29 | 3.2.13 | 6.3 MB | true | 7 | 8775 |
+----------------+------------------+---------+---------+-----------+-----------+------------+
# Stop etcd-maintenance container
docker stop etcd-maintenance
# Start Rancher
docker start $(docker ps -a| grep -E "rancher/rancher:|rancher/rancher " | awk '{ print $1 }')