Safely disable HA auto-reboots for maintenance

If you are going to perform any kind of maintenance works which could disrupt your quorum cluster-wide (e.g. network equipment, small clusters), you would have learnt this risks seemingly random reboots on cluster nodes with (not only) active HA services. ¹

To safely disable HA without additional waiting times and avoiding long-term bugs (which Proxmox do not care for ²), you will want to perform the following:

Before the works

Once (on any node):

mv /etc/pve/ha/{resources.cfg,resources.cfg.bak}

Then on every node:

systemctl stop pve-ha-crm pve-ha-lrm
# check all went well
systemctl is-active pve-ha-crm pve-ha-lrm
# confirm you are ok to proceed without risking a reboot
test -d /run/watchdog-mux.active/ && echo nook || echo ok

After you are done

Reverse the above, so on every node:

systemctl start pve-ha-crm pve-ha-lrm

And then once all nodes are ready, reactivate the HA:

mv /etc/pve/ha/{resources.cfg.bak,resources.cfg}

https://pve.proxmox.com/wiki/Fencing ↩
https://bugzilla.proxmox.com/show_bug.cgi?id=5243 ↩

free-pmx/pve-ha-disable-maintenance.md

Safely disable HA auto-reboots for maintenance

Before the works

After you are done

Footnotes