In the interest of complete transparency, if you follow this guide, there’s a very minuscule but non-zero chance that you may violate the Bekenstein bound, at which the resulting black hole may swallow the earth whole. You have been warned!
- This guide is for development, testing, and research purposes only. This guide comes with no guarantee or warranty that these steps will work within your environment. Should you attempt within a production environment, any negative outcomes are not the fault of this guide or its author.
- This guide was tested on Proxmox 8 / Debian 12.
- This example uses "host1" and "host2" as example names for the hosts
- This example uses "example-test.ts.net" as a Tailscale MagicDNS domain
- The Tailscale IP for host1 is 100.64.1.1
- The Tailscale IP for host2 is 100.64.2.2
-
Setup two Proxmox hosts
-
Install Tailscale on the hosts:
curl -fsSL https://tailscale.com/install.sh | sh;
-
Update /etc/hosts on all hosts with the proper host entries:
100.64.1.1 host1.example-test.ts.net host1
100.64.2.2 host2.example-test.ts.net host2
-
Since DNS queries will be served via Tailscale, ensure that your global DNS server via Tailscale can resolve host1 as 100.64.1.1 and host2 as 100.64.2.2
-
If you need to allow for the traffic within your Tailscale ACL, allow TCP 22, TCP 8006, and UDP 5405 - 5412; example as follows:
{"action": "accept", "proto": "tcp", "src": ["host1", "host2"], "dst": ["host1:22"]}, // SSH {"action": "accept", "proto": "tcp", "src": ["host1", "host2"], "dst": ["host2:22"]}, // SSH {"action": "accept", "proto": "tcp", "src": ["host1", "host2"], "dst": ["host1:8006"]}, // Proxmox web {"action": "accept", "proto": "tcp", "src": ["host1", "host2"], "dst": ["host2:8006"]}, // Proxmox web {"action": "accept", "proto": "udp", "src": ["host1", "host2"], "dst": ["host1:5405"]}, // corosync {"action": "accept", "proto": "udp", "src": ["host1", "host2"], "dst": ["host1:5406"]}, // corosync {"action": "accept", "proto": "udp", "src": ["host1", "host2"], "dst": ["host1:5407"]}, // corosync {"action": "accept", "proto": "udp", "src": ["host1", "host2"], "dst": ["host1:5408"]}, // corosync {"action": "accept", "proto": "udp", "src": ["host1", "host2"], "dst": ["host1:5409"]}, // corosync {"action": "accept", "proto": "udp", "src": ["host1", "host2"], "dst": ["host1:5410"]}, // corosync {"action": "accept", "proto": "udp", "src": ["host1", "host2"], "dst": ["host1:5411"]}, // corosync {"action": "accept", "proto": "udp", "src": ["host1", "host2"], "dst": ["host1:5412"]}, // corosync {"action": "accept", "proto": "udp", "src": ["host1", "host2"], "dst": ["host2:5405"]}, // corosync {"action": "accept", "proto": "udp", "src": ["host1", "host2"], "dst": ["host2:5406"]}, // corosync {"action": "accept", "proto": "udp", "src": ["host1", "host2"], "dst": ["host2:5407"]}, // corosync {"action": "accept", "proto": "udp", "src": ["host1", "host2"], "dst": ["host2:5408"]}, // corosync {"action": "accept", "proto": "udp", "src": ["host1", "host2"], "dst": ["host2:5409"]}, // corosync {"action": "accept", "proto": "udp", "src": ["host1", "host2"], "dst": ["host2:5410"]}, // corosync {"action": "accept", "proto": "udp", "src": ["host1", "host2"], "dst": ["host2:5411"]}, // corosync {"action": "accept", "proto": "udp", "src": ["host1", "host2"], "dst": ["host2:5412"]}, // corosync
-
Create the cluster using host1 (so that host2 has a cluster to join to)
-
In order for clustering to initially succeed, all cluster members must only have a link0 within corosync associated with Tailscale (if any other links exists within corosync, they must be temporarily removed for this initial cluster member addition to succeed); to have host2 join the cluster of host1, then run from host2:
pvecm add host1 --link0 100.64.2.2
-
You should SSH in from host1 to host2 and vice versa; if this isn't done, then tasks like migrations and replications may not work until performed:
ssh host1
ssh host2
-
That should do it! Test, test, test!
To add a third member to the cluster (and so on), repeat these similar steps.
Should clustering not be successful, you'll need to do two things:
- Remove the err'd member from host1 by running:
pvecm delnode host2
- Reset clustering on host2 by running:
systemctl stop pve-cluster corosync; pmxcfs -l; rm -rf /etc/corosync/*; rm /etc/pve/corosync.conf; killall pmxcfs; systemctl start pve-cluster; pvecm updatecerts;
Then try again.
I managed to achieve this without using a local address. I did the following steps before the step 7 in the original post:
/etc/pve/corosync.conf
, and change thering0_addr
of the host 1 to its Tailscale IPsystemctl restart corosync
. Although thepve-cluster
service will try to perform acorosync-config -R
, it would fail before restarting the corosync service.corosync-config -R
and confirm that there is no error.pvecm add
may require a fingerprint for hostname verification, and it can be retrived from the web GUI of the host 1 (Datacenter -> Cluster -> Join Information)