Skip to content

Instantly share code, notes, and snippets.

@free-pmx
Created November 14, 2024 18:16
Show Gist options
  • Save free-pmx/76fb355398fd816a2523922a3492ba93 to your computer and use it in GitHub Desktop.
Save free-pmx/76fb355398fd816a2523922a3492ba93 to your computer and use it in GitHub Desktop.

Safely disable HA auto-reboots for maintenance

If you are going to perform any kind of maintenance works which could disrupt your quorum cluster-wide (e.g. network equipment, small clusters), you would have learnt this risks seemingly random reboots on cluster nodes with (not only) active HA services. 1

To safely disable HA without additional waiting times and avoiding long-term bugs (which Proxmox do not care for 2), you will want to perform the following:

Before the works

Once (on any node):

mv /etc/pve/ha/{resources.cfg,resources.cfg.bak}

Then on every node:

systemctl stop pve-ha-crm pve-ha-lrm
# check all went well
systemctl is-active pve-ha-crm pve-ha-lrm
# confirm you are ok to proceed without risking a reboot
test -d /run/watchdog-mux.active/ && echo nook || echo ok

After you are done

Reverse the above, so on every node:

systemctl start pve-ha-crm pve-ha-lrm

And then once all nodes are ready, reactivate the HA:

mv /etc/pve/ha/{resources.cfg.bak,resources.cfg}

Footnotes

  1. https://pve.proxmox.com/wiki/Fencing

  2. https://bugzilla.proxmox.com/show_bug.cgi?id=5243

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment