Bolt operations are copy-on-write. When a page is updated, it is copied to a completely new page. The old page is added to a "freelist", which Bolt refers to when it needs a new page. This means that deleting large amounts of data will not actually free up space on disk, as the pages are instead kept on Bolt's freelist for future use. In order to free up this space to disk, you will need to perform a defrag.
The process of defragmentation releases this storage space back to the file system. Defragmentation is issued on a per-member so that cluster-wide latency spikes may be avoided.
- lock batchTx to ensure nobody is using previous tx, and then close previous ongoing tx.
- lock database after lock tx to avoid deadlock.
- block concurrent read requests while resetting tx
- create a
db.tmp.*
file for new db and open it - start defrag
-
- open a tx on tmpdb for writes
-
- open a tx on old db for read
-
- traverse the actual db from first to end using cursor
-
- create a new bucket for each
-
- traverse all keys
-
- start commit, copy the bucket and put
-
- rollback if any error
- close all databases
- rename tmp db to actual db
- observe metrics
- release all locks
-
100 MB usually takes 1 sec (
backend_defrag_duration_seconds
) -
metrics:
defrag_inflight
(Whether or not defrag is active) -
./etcdutl defrag --data-dir default.etcd
: Defragment while etcd is not running -
ability to defrag on bootstrap if (ExperimentalBootstrapDefragThresholdMegabytes set non-zero)
-
https://etcd.io/docs/v3.4/op-guide/maintenance/#defragmentation