This is not official documentation, have/make backups, use at your own risk.
v2.6.3 and up only
When etcd db size exceeds quota, it will raise an alarm and throw the error mvcc: database space exceeded
.
To manually trigger this situation:
docker exec -ti $(docker ps | grep -E "rancher/rancher:|rancher/rancher " | awk '{ print $1 }') bash -c "while [ 1 ]; do dd if=/dev/urandom bs=1024 count=1024 | ETCDCTL_API=3 ETCDCTL_ENDPOINTS='https://127.0.0.1:2379' ETCDCTL_CACERT='/var/lib/rancher/k3s/server/tls/etcd/server-ca.crt' ETCDCTL_CERT='/var/lib/rancher/k3s/server/tls/etcd/server-client.crt' ETCDCTL_KEY='/var/lib/rancher/k3s/server/tls/etcd/server-client.key' etcdctl put key || break; done"
You can get the current status of etcd by running:
docker exec -ti $(docker ps | grep -E "rancher/rancher:|rancher/rancher " | awk '{ print $1 }') bash -c "ETCDCTL_API=3 ETCDCTL_ENDPOINTS='https://127.0.0.1:2379' ETCDCTL_CACERT='/var/lib/rancher/k3s/server/tls/etcd/server-ca.crt' ETCDCTL_CERT='/var/lib/rancher/k3s/server/tls/etcd/server-client.crt' ETCDCTL_KEY='/var/lib/rancher/k3s/server/tls/etcd/server-client.key' etcdctl endpoint status --write-out=table"
+----------------+------------------+---------+---------+-----------+-----------+------------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX |
+----------------+------------------+---------+---------+-----------+-----------+------------+
| 127.0.0.1:2379 | 8e9e05c52164694d | 3.4.13 | 1.1 GB | true | 2 | 3409 |
+----------------+------------------+---------+---------+-----------+-----------+------------+
docker exec -ti $(docker ps | grep -E "rancher/rancher:|rancher/rancher " | awk '{ print $1 }') bash -c "ETCDCTL_API=3 ETCDCTL_ENDPOINTS='https://127.0.0.1:2379' ETCDCTL_CACERT='/var/lib/rancher/k3s/server/tls/etcd/server-ca.crt' ETCDCTL_CERT='/var/lib/rancher/k3s/server/tls/etcd/server-client.crt' ETCDCTL_KEY='/var/lib/rancher/k3s/server/tls/etcd/server-client.key' etcdctl alarm list"
memberID:10276657743932975437 alarm:NOSPACE
Compact and defrag:
rev=$(docker exec -ti $(docker ps | grep -E "rancher/rancher:|rancher/rancher " | awk '{ print $1 }') bash -c "ETCDCTL_API=3 ETCDCTL_ENDPOINTS='https://127.0.0.1:2379' ETCDCTL_CACERT='/var/lib/rancher/k3s/server/tls/etcd/server-ca.crt' ETCDCTL_CERT='/var/lib/rancher/k3s/server/tls/etcd/server-client.crt' ETCDCTL_KEY='/var/lib/rancher/k3s/server/tls/etcd/server-client.key' etcdctl endpoint status --write-out fields | grep Revision | cut -d: -f2")
echo $rev
4161
docker exec -ti $(docker ps | grep -E "rancher/rancher:|rancher/rancher " | awk '{ print $1 }') bash -c "ETCDCTL_API=3 ETCDCTL_ENDPOINTS='https://127.0.0.1:2379' ETCDCTL_CACERT='/var/lib/rancher/k3s/server/tls/etcd/server-ca.crt' ETCDCTL_CERT='/var/lib/rancher/k3s/server/tls/etcd/server-client.crt' ETCDCTL_KEY='/var/lib/rancher/k3s/server/tls/etcd/server-client.key' etcdctl compact ${rev%?}"
compacted revision 4161
docker exec -ti $(docker ps | grep -E "rancher/rancher:|rancher/rancher " | awk '{ print $1 }') bash -c "ETCDCTL_API=3 ETCDCTL_ENDPOINTS='https://127.0.0.1:2379' ETCDCTL_CACERT='/var/lib/rancher/k3s/server/tls/etcd/server-ca.crt' ETCDCTL_CERT='/var/lib/rancher/k3s/server/tls/etcd/server-client.crt' ETCDCTL_KEY='/var/lib/rancher/k3s/server/tls/etcd/server-client.key' etcdctl defrag"
Finished defragmenting etcd member[127.0.0.1:2379]
docker exec -ti $(docker ps | grep -E "rancher/rancher:|rancher/rancher " | awk '{ print $1 }') bash -c "ETCDCTL_API=3 ETCDCTL_ENDPOINTS='https://127.0.0.1:2379' ETCDCTL_CACERT='/var/lib/rancher/k3s/server/tls/etcd/server-ca.crt' ETCDCTL_CERT='/var/lib/rancher/k3s/server/tls/etcd/server-client.crt' ETCDCTL_KEY='/var/lib/rancher/k3s/server/tls/etcd/server-client.key' etcdctl alarm disarm"
memberID:10276657743932975437 alarm:NOSPACE
docker exec -ti $(docker ps | grep -E "rancher/rancher:|rancher/rancher " | awk '{ print $1 }') bash -c "ETCDCTL_API=3 ETCDCTL_ENDPOINTS='https://127.0.0.1:2379' ETCDCTL_CACERT='/var/lib/rancher/k3s/server/tls/etcd/server-ca.crt' ETCDCTL_CERT='/var/lib/rancher/k3s/server/tls/etcd/server-client.crt' ETCDCTL_KEY='/var/lib/rancher/k3s/server/tls/etcd/server-client.key' etcdctl alarm list"
<empty>
docker exec -ti $(docker ps | grep -E "rancher/rancher:|rancher/rancher " | awk '{ print $1 }') bash -c "ETCDCTL_API=3 ETCDCTL_ENDPOINTS='https://127.0.0.1:2379' ETCDCTL_CACERT='/var/lib/rancher/k3s/server/tls/etcd/server-ca.crt' ETCDCTL_CERT='/var/lib/rancher/k3s/server/tls/etcd/server-client.crt' ETCDCTL_KEY='/var/lib/rancher/k3s/server/tls/etcd/server-client.key' etcdctl endpoint status --write-out=table"
+----------------+------------------+---------+---------+-----------+-----------+------------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX |
+----------------+------------------+---------+---------+-----------+-----------+------------+
| 127.0.0.1:2379 | 8e9e05c52164694d | 3.4.13 | 3.2 MB | true | 3 | 5014 |
+----------------+------------------+---------+---------+-----------+-----------+------------+
At this point, the rancher/rancher
container should stop logging mvcc: database space exceeded
.
In case that the rancher/rancher
won't keep running, we need external maintenance to etcd as we cannot use the rancher/rancher
container to perform maintenance.
# Stop Rancher container (and block restarting)
docker stop $(docker ps | grep -E "rancher/rancher:|rancher/rancher " | awk '{ print $1 }')
# Run etcd container with data dir from Rancher's embedded etcd
docker run -d -e ETCDCTL_API=3 --name etcd-maintenance --volumes-from=$(docker ps -a| grep -E "rancher/rancher:|rancher/rancher " | awk '{ print $1 }') quay.io/coreos/etcd:v3.4.13 /usr/local/bin/etcd --data-dir=/var/lib/rancher/k3s/server/db/etcd
# Check etcd status
docker exec etcd-maintenance etcdctl endpoint status --write-out=table
+----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------------------------------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------------------------------+
| 127.0.0.1:2379 | 8e9e05c52164694d | 3.4.15 | 2.2 GB | true | false | 4 | 6180 | 6180 | memberID:10276657743932975437 |
| | | | | | | | | | alarm:NOSPACE |
+----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------------------------------+
docker exec etcd-maintenance etcdctl alarm list
memberID:10276657743932975437 alarm:NOSPACE
# Run compact/defrag
rev=$(docker exec etcd-maintenance etcdctl endpoint status --write-out json | egrep -o '"revision":[0-9]*' | egrep -o '[0-9]*')
echo $rev
4161
docker exec etcd-maintenance etcdctl compact "$rev"
compacted revision 4161
docker exec etcd-maintenance etcdctl defrag
Finished defragmenting etcd member[127.0.0.1:2379]
docker exec etcd-maintenance etcdctl alarm disarm
memberID:10276657743932975437 alarm:NOSPACE
docker exec etcd-maintenance etcdctl alarm list
<empty>
docker exec etcd-maintenance etcdctl endpoint status --write-out=table
+----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| 127.0.0.1:2379 | 8e9e05c52164694d | 3.4.15 | 8.4 MB | true | false | 4 | 6185 | 6185 | |
+----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
# Stop etcd-maintenance container
docker stop etcd-maintenance
# Start Rancher
docker start $(docker ps -a| grep -E "rancher/rancher:|rancher/rancher " | awk '{ print $1 }')