A no-nonsense way to safely backup your /etc/pve
files (pmxcfs) 1 is actually very simple:
sqlite3 /var/lib/pve-cluster/config.db .dump > ~/config.dump.$(date --utc +%Z%Y%m%d%H%M%S).sql
This is safe to execute on a running node and is only necessary on any single node of the cluster, the results (at specific point in time) will be exactly the same.
Obviously, it makes more sense to save this somewhere else than the home directory ~
, especially if you have dependable shared storage off the cluster. Ideally, you want a systemd timer, cron job or a hook to your other favourite backup method launching this.
You will ideally never need to recover from this backup. In case of single node's corrupt config database, you are best off to copy over /var/lib/pve-cluster/config.db
(while inactive) from a healthy node and let the implantee catch up with the cluster.
However, failing everything else, you will want to stop cluster service, put aside the (possibly) corrupt database and get the last good state back:
systemctl stop pve-cluster
killall pmxcfs
mv /var/lib/pve-cluster/config.db{,.corrupt}
sqlite3 /var/lib/pve-cluster/config.db < ~/config.dump.<timestamp>.sql
systemctl start pve-cluster
NOTE Any leftover WAL will be ignored.
The .dump
command 2 reads the database as if with a SELECT
statement within a single transaction. It will block concurrent writes, but once it finishes, you have a "snapshot". The result is a perfectly valid SQL set of commands to recreate your database.
There's an alternative .save
command (equivalent to .backup
), it would produce a valid copy of the actual .db
file, and while it is non-blocking copying the base page by page, if they get dirty in the process, the process needs to start over. You could receive Error: database is locked
failure on the attempt. If you insist on this method, you may need to append .timeout <milliseconds>
to get more luck with it.
Another option yet would be to use VACUUM
command with an INTO
clause 3, but it does not fsync the result on its own!
If you already have a corrupt .db
file at hand (and nothing better), you may try your luck with .recover
. 4
There are cases when you make changes in your configurations, only to want to partially revert it back.
Alternatively, you get hold of stale (from non-quorate node) or partially corrupt config.db
and want to take out only some of the previous files. without making it your current node's cluster filesystem.
Less often, you might want to edit the contents of the database-backed filesystem without side effects to the node or cluster, e.g. in order to implant it into a separate/cloned/new cluster.
DISCLAIMER If you do not understand the summary above, do NOT proceed.
This is actually possible, however since the pmxcfs 1 relies on hardcoded locations for its backend database file as well as mountpoint, you would need to use chroot
5.
mkdir -p ~/jail-pmxcfs/{dev,usr,bin,sbin,lib,lib64,etc,var/lib/pve-cluster,var/run}
for i in /dev /usr /bin /sbin /lib /lib64 /etc; do mount --bind -o ro $i /root/jail-pmxcfs/$i; done
This will create alternative root structure for your own instance of pmxcfs, the only thing left is to implant the database of interest, in this example from existing one:
sqlite3 /var/lib/pve-cluster/config.db .dump > ~/config.dump.sql
sqlite3 ~/jail-pmxcfs/var/lib/pve-cluster/config.db < ~/config.dump.sql
Now launch your own pmxcfs instance in local mode (-l
) in the chroot environment:
chroot ~/jail-pmxcfs/ pmxcfs -l
You can double check your instance runs using the database file that was just provided:
lsof ~/jail-pmxcfs/var/lib/pve-cluster/config.db
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
pmxcfs 1225 root 4u REG 252,1 77824 61 /root/jail-pmxcfs/var/lib/pve-cluster/config.db
In fact, if you have the regular pve-cluster
service running, you will be able to see there's two instances running, each over its own database, the new one in local mode (-l
):
ps -C pmxcfs -f
UID PID PPID C STIME TTY TIME CMD
root 656 1 0 10:34 ? 00:00:02 /usr/bin/pmxcfs
root 1225 1 0 10:37 ? 00:00:00 pmxcfs -l
Now you can copy out your files or perform changes in ~/jail-pmxcfs/etc/pve
without affecting your regular operation.
You can also make an SQL dump 2 of ~/jail-pmxcfs/var/lib/pve-cluster/config.db
- your now modified database.
Once you are finished, you will want to get rid of the extra instance (based on the PID of the local (-l
) instance obtained above):
kill $PID
And destroy the temporary chroot structure:
umount ~/jail-pmxcfs/etc/pve ~/jail-pmxcfs/* &&
rm -rf ~/jail-pmxcfs/