If you encounter this error in your logs, your GUI is also inaccessible. You would have found it with console access or direct SSH:
journalctl -e
This output will contain copious amount of:
pveproxy[]: /etc/pve/local/pve-ssl.key: failed to load local private key (key_file or key) at /usr/share/perl5/PVE/APIServer/AnyEvent.pm line 2025.
If your /etc/pve
is entirely empty, you have hit a situation that can send you troubleshooting the wrong thing - this is so common, it is worth knowing about in general.
This location belongs to the virtual filesystem pmxcfs 1, which has to be mounted and if it is, it can NEVER be empty.
You can confirm that it is NOT mounted:
mountpoint -d /etc/pve
For a mounted filesystem, this would return MAJ:MIN
device numbers, when unmounted simply:
/etc/pve is not a mountpoint
If you scrolled up much further in the log, you would eventually find that most services could not be even started:
pmxcfs[]: [main] crit: Unable to resolve node name 'nodename' to a non-loopback IP address - missing entry in '/etc/hosts' or DNS?
systemd[1]: Failed to start pve-cluster.service - The Proxmox VE cluster filesystem.
systemd[1]: Failed to start pve-firewall.service - Proxmox VE firewall.
systemd[1]: Failed to start pvestatd.service - PVE Status Daemon.
systemd[1]: Failed to start pve-ha-crm.service - PVE Cluster HA Resource Manager Daemon.
systemd[1]: Failed to start pve-ha-lrm.service - PVE Local HA Resource Manager Daemon.
systemd[1]: Failed to start pve-guests.service - PVE guests.
systemd[1]: Failed to start pvescheduler.service - Proxmox VE scheduler.
It is the missing entry in '/etc/hosts' or DNS
that is causing all of this, the resulting errors were simply unhandled.
Compare your /etc/hostname
and /etc/hosts
, possibly also IP entries in /etc/network/interfaces
and check against output of ip -c a
.
As of today, PVE relies on hostname to be resolvable, in order to self-identify within a cluter, by default with entry in /etc/hosts
. Counterintuitively, this is even the case for a single node install.
A mismatching or mangled entry in /etc/hosts
, 2 a misconfigured /etc/nsswitch.conf
3 or /etc/gai.conf
4 can cause this.
You can confirm having fixed the problem with:
hostname -i
Your non-loopback (other than 127.*.*.*
for IPv4) address has to be in this list.
NOTE If your pve-cluster version is prior to 8.0.2, you have to check with:
hostname -I
If all of the above looks in order, you need to check the logs more thoroughly and look for different issue, second most common would be:
pmxcfs[]: [main] crit: memdb_open failed - unable to open database '/var/lib/pve-cluster/config.db'
This is out of scope for this post, but feel free to explore your options of recovery in Backup Cluster config post 5.
If you had already started mistakenly recreating e.g. SSL keys in unmounted /etc/pve
, you have to wipe it before applying the advice above. This situation exhibits itself in the log as:
pmxcfs[]: [main] crit: fuse_mount error: File exists
Finally, you can prevent this by setting the unmounted directory as immutable 6:
systemctl stop pve-cluster
chattr +i /etc/pve
systemctl start pve-cluster
NOTE All respective bugs mentioned above filed with Proxmox.
Footnotes
-
https://pve.proxmox.com/wiki/Proxmox_Cluster_File_System_(pmxcfs) ↩
-
https://manpages.debian.org/bookworm/manpages/hosts.5.en.html ↩
-
https://manpages.debian.org/bookworm/manpages/nsswitch.conf.5.en.html ↩
-
https://manpages.debian.org/bookworm/manpages/gai.conf.5.en.html ↩
-
https://gist.github.com/free-pmx/47ea73e1921440e29d8792cc0ea1e7b9 ↩
-
https://manpages.debian.org/bookworm/e2fsprogs/chattr.1.en.html ↩