Skip to content

Instantly share code, notes, and snippets.

@willjasen
Last active February 24, 2025 02:13
Show Gist options
  • Save willjasen/df71ca4ec635211d83cdc18fe7f658ca to your computer and use it in GitHub Desktop.
Save willjasen/df71ca4ec635211d83cdc18fe7f658ca to your computer and use it in GitHub Desktop.
Create a Proxmox cluster that communicates over Tailscale

‼️ DANGER ‼️

In the interest of complete transparency, if you follow this guide, there’s a very minuscule but non-zero chance that you may violate the Bekenstein bound, at which the resulting black hole may swallow the earth whole. You have been warned!


⚠️ WARNING ⚠️

  • This guide is for development, testing, and research purposes only. This guide comes with no guarantee or warranty that these steps will work within your environment. Should you attempt within a production environment, any negative outcomes are not the fault of this guide or its author.
  • This guide was tested on Proxmox 8 / Debian 12.

📝 Prologue 📝

  • This example uses "host1" and "host2" as example names for the hosts
  • This example uses "example-test.ts.net" as a Tailscale MagicDNS domain
  • The Tailscale IP for host1 is 100.64.1.1
  • The Tailscale IP for host2 is 100.64.2.2

📋 Steps 📋

  1. Setup two Proxmox hosts

  2. Install Tailscale on the hosts: curl -fsSL https://tailscale.com/install.sh | sh;

  3. Update /etc/hosts on all hosts with the proper host entries:

    • 100.64.1.1 host1.example-test.ts.net host1
    • 100.64.2.2 host2.example-test.ts.net host2
  4. Since DNS queries will be served via Tailscale, ensure that your global DNS server via Tailscale can resolve host1 as 100.64.1.1 and host2 as 100.64.2.2

  5. If you need to allow for the traffic within your Tailscale ACL, allow TCP 22, TCP 8006, and UDP 5405 - 5412; example as follows:

    {"action": "accept", "proto": "tcp", "src": ["host1", "host2"], "dst": ["host1:22"]},   // SSH
    {"action": "accept", "proto": "tcp", "src": ["host1", "host2"], "dst": ["host2:22"]},   // SSH
    {"action": "accept", "proto": "tcp", "src": ["host1", "host2"], "dst": ["host1:8006"]}, // Proxmox web
    {"action": "accept", "proto": "tcp", "src": ["host1", "host2"], "dst": ["host2:8006"]}, // Proxmox web
    {"action": "accept", "proto": "udp", "src": ["host1", "host2"], "dst": ["host1:5405"]}, // corosync
    {"action": "accept", "proto": "udp", "src": ["host1", "host2"], "dst": ["host1:5406"]}, // corosync
    {"action": "accept", "proto": "udp", "src": ["host1", "host2"], "dst": ["host1:5407"]}, // corosync
    {"action": "accept", "proto": "udp", "src": ["host1", "host2"], "dst": ["host1:5408"]}, // corosync
    {"action": "accept", "proto": "udp", "src": ["host1", "host2"], "dst": ["host1:5409"]}, // corosync
    {"action": "accept", "proto": "udp", "src": ["host1", "host2"], "dst": ["host1:5410"]}, // corosync
    {"action": "accept", "proto": "udp", "src": ["host1", "host2"], "dst": ["host1:5411"]}, // corosync
    {"action": "accept", "proto": "udp", "src": ["host1", "host2"], "dst": ["host1:5412"]}, // corosync
    {"action": "accept", "proto": "udp", "src": ["host1", "host2"], "dst": ["host2:5405"]}, // corosync
    {"action": "accept", "proto": "udp", "src": ["host1", "host2"], "dst": ["host2:5406"]}, // corosync
    {"action": "accept", "proto": "udp", "src": ["host1", "host2"], "dst": ["host2:5407"]}, // corosync
    {"action": "accept", "proto": "udp", "src": ["host1", "host2"], "dst": ["host2:5408"]}, // corosync
    {"action": "accept", "proto": "udp", "src": ["host1", "host2"], "dst": ["host2:5409"]}, // corosync
    {"action": "accept", "proto": "udp", "src": ["host1", "host2"], "dst": ["host2:5410"]}, // corosync
    {"action": "accept", "proto": "udp", "src": ["host1", "host2"], "dst": ["host2:5411"]}, // corosync
    {"action": "accept", "proto": "udp", "src": ["host1", "host2"], "dst": ["host2:5412"]}, // corosync
    
  6. Create the cluster using host1 (so that host2 has a cluster to join to)

  7. In order for clustering to initially succeed, all cluster members must only have a link0 within corosync associated with Tailscale (if any other links exists within corosync, they must be temporarily removed for this initial cluster member addition to succeed); to have host2 join the cluster of host1, then run from host2: pvecm add host1 --link0 100.64.2.2

  8. You should SSH in from host1 to host2 and vice versa; if this isn't done, then tasks like migrations and replications may not work until performed:

    • ssh host1
    • ssh host2
  9. That should do it! Test, test, test!

To add a third member to the cluster (and so on), repeat these similar steps.


🔧 Troubleshooting 🔧

Adding to the Cluster

Should clustering not be successful, you'll need to do two things:

  1. Remove the err'd member from host1 by running: pvecm delnode host2
  2. Reset clustering on host2 by running: systemctl stop pve-cluster corosync; pmxcfs -l; rm -rf /etc/corosync/*; rm /etc/pve/corosync.conf; killall pmxcfs; systemctl start pve-cluster; pvecm updatecerts;

Then try again.

Maintaining Quorum

You may find in a large cluster (5 or more members) that features like the web interface won't work properly between cluster members. This is likely because quorum via corosync hasn't been properly achieved. The file at /etc/pve/.members may show a node or nodes as "online": 1 indicating that it is online and communicable to in some form, but the ip value never shows. In circumstances where one of the members has an underperforming network connection in relation to the other cluster members (particularly in reference to a high latency measured in 200-300 ms), then corosync should be stopped and disabled on that member temporarily. To do that, run systemctl stop corosync; systemctl disable corosync;. To enable and start it again, run systemctl enable corosync; systemctl start corosync;.

💭 After Thoughts 💭

In order to use a Tailscale certificate with your host's web services, please see tailscale-cert-services/proxmox-cert.sh

@ky-bd
Copy link

ky-bd commented Oct 13, 2024

Hi, when I follow your instruction I got stuck within joining host 2 into the cluster because I had to create the cluster with an link0 containing a non tailscale IP. Seems to be the case, that there is an additional linux bridge needed or how did you create the cluster with a link0 containing an IP address from tailscale?

I managed to achieve this without using a local address. I did the following steps before the step 7 in the original post:

  1. Edit /etc/pve/corosync.conf, and change the ring0_addr of the host 1 to its Tailscale IP
  2. Restart corosync service: systemctl restart corosync. Although the pve-cluster service will try to perform a corosync-config -R, it would fail before restarting the corosync service.
  3. Run corosync-config -R and confirm that there is no error.
  4. Continue with step 7. pvecm add may require a fingerprint for hostname verification, and it can be retrived from the web GUI of the host 1 (Datacenter -> Cluster -> Join Information)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment