Skip to content

Instantly share code, notes, and snippets.

@free-pmx
Created November 2, 2024 08:19
Show Gist options
  • Save free-pmx/47ea73e1921440e29d8792cc0ea1e7b9 to your computer and use it in GitHub Desktop.
Save free-pmx/47ea73e1921440e29d8792cc0ea1e7b9 to your computer and use it in GitHub Desktop.

Proxmox VE - Backup Cluster config (pmxcfs) - /etc/pve

Backup

A no-nonsense way to safely backup your /etc/pve files (pmxcfs) 1 is actually very simple:

sqlite3 /var/lib/pve-cluster/config.db .dump > ~/config.dump.$(date --utc +%Z%Y%m%d%H%M%S).sql

This is safe to execute on a running node and is only necessary on any single node of the cluster, the results (at specific point in time) will be exactly the same.

Obviously, it makes more sense to save this somewhere else than the home directory ~, especially if you have dependable shared storage off the cluster. Ideally, you want a systemd timer, cron job or a hook to your other favourite backup method launching this.


Recovery

You will ideally never need to recover from this backup. In case of single node's corrupt config database, you are best off to copy over /var/lib/pve-cluster/config.db (while inactive) from a healthy node and let the implantee catch up with the cluster.

However, failing everything else, you will want to stop cluster service, put aside the (possibly) corrupt database and get the last good state back:

systemctl stop pve-cluster
killall pmxcfs
mv /var/lib/pve-cluster/config.db{,.corrupt}
sqlite3 /var/lib/pve-cluster/config.db < ~/config.dump.<timestamp>.sql
systemctl start pve-cluster

NOTE Any leftover WAL will be ignored.

Additional notes on SQLite CLI

The .dump command 2 reads the database as if with a SELECT statement within a single transaction. It will block concurrent writes, but once it finishes, you have a "snapshot". The result is a perfectly valid SQL set of commands to recreate your database.

There's an alternative .save command (equivalent to .backup), it would produce a valid copy of the actual .db file, and while it is non-blocking copying the base page by page, if they get dirty in the process, the process needs to start over. You could receive Error: database is locked failure on the attempt. If you insist on this method, you may need to append .timeout <milliseconds> to get more luck with it.

Another option yet would be to use VACUUM command with an INTO clause 3, but it does not fsync the result on its own!

If you already have a corrupt .db file at hand (and nothing better), you may try your luck with .recover. 4


Extract configurations

There are cases when you make changes in your configurations, only to want to partially revert it back.

Alternatively, you get hold of stale (from non-quorate node) or partially corrupt config.db and want to take out only some of the previous files. without making it your current node's cluster filesystem.

Less often, you might want to edit the contents of the database-backed filesystem without side effects to the node or cluster, e.g. in order to implant it into a separate/cloned/new cluster.


DISCLAIMER If you do not understand the summary above, do NOT proceed.


This is actually possible, however since the pmxcfs 1 relies on hardcoded locations for its backend database file as well as mountpoint, you would need to use chroot 5.

mkdir -p ~/jail-pmxcfs/{dev,usr,bin,sbin,lib,lib64,etc,var/lib/pve-cluster,var/run}
for i in /dev /usr /bin /sbin /lib /lib64 /etc; do mount --bind -o ro $i /root/jail-pmxcfs/$i; done

This will create alternative root structure for your own instance of pmxcfs, the only thing left is to implant the database of interest, in this example from existing one:

sqlite3 /var/lib/pve-cluster/config.db .dump > ~/config.dump.sql
sqlite3 ~/jail-pmxcfs/var/lib/pve-cluster/config.db < ~/config.dump.sql

Now launch your own pmxcfs instance in local mode (-l) in the chroot environment:

chroot ~/jail-pmxcfs/ pmxcfs -l

You can double check your instance runs using the database file that was just provided:

lsof ~/jail-pmxcfs/var/lib/pve-cluster/config.db

COMMAND  PID USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
pmxcfs  1225 root    4u   REG  252,1    77824   61 /root/jail-pmxcfs/var/lib/pve-cluster/config.db

In fact, if you have the regular pve-cluster service running, you will be able to see there's two instances running, each over its own database, the new one in local mode (-l):

ps -C pmxcfs -f

UID          PID    PPID  C STIME TTY          TIME CMD
root         656       1  0 10:34 ?        00:00:02 /usr/bin/pmxcfs
root        1225       1  0 10:37 ?        00:00:00 pmxcfs -l

Now you can copy out your files or perform changes in ~/jail-pmxcfs/etc/pve without affecting your regular operation.

You can also make an SQL dump 2 of ~/jail-pmxcfs/var/lib/pve-cluster/config.db - your now modified database.

Once you are finished, you will want to get rid of the extra instance (based on the PID of the local (-l) instance obtained above):

kill $PID

And destroy the temporary chroot structure:

umount ~/jail-pmxcfs/etc/pve ~/jail-pmxcfs/* &&
rm -rf ~/jail-pmxcfs/

Footnotes

  1. https://pve.proxmox.com/wiki/Proxmox_Cluster_File_System_(pmxcfs) 2

  2. https://www.sqlite.org/cli.html#converting_an_entire_database_to_a_text_file 2

  3. https://www.sqlite.org/lang_vacuum.html

  4. https://www.sqlite.org/cli.html#recover_data_from_a_corrupted_database

  5. https://manpages.debian.org/bookworm/coreutils/chroot.8.en.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment