scyto/proxmox-cluster.md

Last active April 28, 2025 05:36

Star (1) You must be signed in to star a gist
Fork (4) You must be signed in to fork a gist

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/scyto/645193291b9a81eb3cb6ebefe68274ae.js"></script>
Save scyto/645193291b9a81eb3cb6ebefe68274ae to your computer and use it in GitHub Desktop.

Download ZIP

Proxmox cluster Setup

Raw

proxmox-cluster.md

Proxmox cluster Setup

this gist is part of this series

Network Design

Put simply I am not sure what the design should be. I have the thunderbolt mesh network and the 2.5gbe NIC on each node. The ideal design guidelies cause my brain to have a race conditions because:

ceph shold have a dedicated network
proxmox should not have migration traffic and cluster communications network
one wants cluster communicationsnetwork reddundant

I have 3 networks:

Onboard 2.5gb NIC connected to one switch for subnet IPv4 192.168.1.0/24 and IPv6 /64 address (my LAN)
Thunderbolt mesh connected in a ring for subnet fc00::80/124
- this has 3 single address subnets fc00::81/128, fc00::82/128 and fc00::83/128 these are used for FRR Openfabric routing between nodes
Addtional 2.5Gbe using (NUCIOALUWS) add-on afor subnet TBD
- cluster (aka corosync) network uses network 1 (2.5gbe)
- ceph migration traffic uses network 2 (thunderbolt)
- ceph public network uses network 2 (thunderbolt
- CT and VM migration traffic uses network 2 (thunderbolt network)

I have not yet decided what network 3 will be used for, options are:

cluster public network that other devices use to access the cluster or its resources
backup corosync (though i don't see a reason not to have corosync on all 3 networks)
ceph public network - but I assume this is what the VMs uses so it makes sense to i want that on the 26Gbps thunderbolt mesh too

Create Cluster

You should have 3 browser tabs open for this, one for each node's management IP.

setup on node 1

navigate to Datacenter > Cluster and click Create Cluster
name the cluster e.g. pve-cluster1
set link 0 to the IPv4 address (in my case 192.168.1.81 on interface vmbr0)
click create

Join node 2

on node 2 in Datacenter > Cluster click join information
the IP address should be node 1 IPv4 address
click copy information
open tab 2 in your browser to node 2 management page
navingate to Datacenter > Cluster and click join cluster
paste the information into the dialog box that you collected in step 3
Fill the root password in of node 1
Select Link 0 as 192.168.1.82
click button join 'pve-cluster1'

Join node 3

on node 1 in Datacenter > Cluster click join information
the IP address should be node 1 IPv4 address
click copy information
open tab 2 in your browser to node 3 management page
navingate to Datacenter > Cluster and click join cluster
paste the information into the dialog box that you collected in step 3
Fill the root password in of node 1
Select Link 0 as 192.168.1.83
click button join 'pve-cluster1'

at this point close your pv2 and pve 3 tabs - you can now manage all 3 cluster nodes from node 1 (or any node)

Define Migration Network

navigate in webui to Datacenter > Options
double click Migration Settings
select any networkand click ok - this is just to create an entry in the config file
edit with nano /etc/pve/datacenter.cfg and change: migration: network=10.0.0.81/32,type=secure to migration: network=fc00::80/124,type=insecure This is because a)this subnet contains fc00::80 thru fc00::8f; and b) because it is 100% isolated network it can be insecure give a small speed boost

Configuring for High Availability

navigate in webui to Datacenter > HA > Groups
click create
Name the cluster (ID) ClusterGroup1
add all 3 nodes and then click create

vdovhanych commented Apr 13, 2024 •

edited

Loading

Thnx for this.
I set it up using two links for the Corosync network. It will create a more redundant configuration if any of the link fail. Corosync has the option to use multiple links, and you can set the link priority to the default one you want to use, and it will automatically failover if that link is not available (switch dies, thunderbolt network fails)

Here is quite good tutorial how to achieve that (or better option you can set it up in the UI when creating the cluster and choosing multiple links)

Author

scyto commented Apr 14, 2024

Thnx for this.

NP - i would go one step further than that and say the second connection should be to a separate switch, this is what I do and why my NUC has two intel 2.5gbe connections

SchuFire commented Jun 25, 2024

Great write up!
Just one ? - in the Define Migration Network section, it would appear that step 4 has incorrect IP addressing.

DerpJim commented Sep 13, 2024

When following the updated guide to create a separate folder for the thunderbolt networking "/etc/network/interfaces.d/thunderbolt" the interfaces lo:0 and lo:6 don't show up in the GUI to allow for setting the migration network. I manually edited the /etc/pve/datacenter.cfg with "migration: network=fc00::81/128,type=insecure" and it shows in the GUI as changed but I haven't tested migration yet. Not sure if this is correct or if I messed something up which isn't letting me select the network via GUI.

DerpJim commented Sep 13, 2024

When following the updated guide to create a separate folder for the thunderbolt networking "/etc/network/interfaces.d/thunderbolt" the interfaces lo:0 and lo:6 don't show up in the GUI to allow for setting the migration network. I manually edited the /etc/pve/datacenter.cfg with "migration: network=fc00::81/128,type=insecure" and it shows in the GUI as changed but I haven't tested migration yet. Not sure if this is correct or if I messed something up which isn't letting me select the network via GUI.

Just tested a migration and it fails. "could not get migration ip: no IP address configured on local node for network 'fc00::81/128'"

NRGnet commented Sep 13, 2024 •

edited

Loading

When following the updated guide to create a separate folder for the thunderbolt networking "/etc/network/interfaces.d/thunderbolt" the interfaces lo:0 and lo:6 don't show up in the GUI to allow for setting the migration network. I manually edited the /etc/pve/datacenter.cfg with "migration: network=fc00::81/128,type=insecure" and it shows in the GUI as changed but I haven't tested migration yet. Not sure if this is correct or if I messed something up which isn't letting me select the network via GUI.

Just tested a migration and it fails. "could not get migration ip: no IP address configured on local node for network 'fc00::81/128'"

@DerpJim Since the loopback interface was moved to allow using GUI to edit network settings it wont show up and will need to manually be set. That being said it looks like you figured that out just did not change the subnet of /128 only has 1 IP, you need to change to a subnet that will have all IP's of the cluster in it. That is why for IPv4 he says to change /32 to /29. I am not the greatest when it comes to IPv6 but I think /125 should work.

DerpJim commented Sep 13, 2024

When following the updated guide to create a separate folder for the thunderbolt networking "/etc/network/interfaces.d/thunderbolt" the interfaces lo:0 and lo:6 don't show up in the GUI to allow for setting the migration network. I manually edited the /etc/pve/datacenter.cfg with "migration: network=fc00::81/128,type=insecure" and it shows in the GUI as changed but I haven't tested migration yet. Not sure if this is correct or if I messed something up which isn't letting me select the network via GUI.

Just tested a migration and it fails. "could not get migration ip: no IP address configured on local node for network 'fc00::81/128'"

@DerpJim Since the loopback interface was moved to allow using GUI to edit network settings it wont show up and will need to manually be set. That being said it looks like you figured that out just did not change the subnet of /128 only has 1 IP, you need to change to a subnet that will have all IP's of the cluster in it. That is why for IPv4 he says to change /32 to /29. I am not the greatest when it comes to IPv6 but I think /125 should work.

@NRGnet Thank you! I definitely missed that note. /125 worked!!! Thank you!

Author

scyto commented Sep 16, 2024

@NRGnet do you think we should move the loopbacks back to the main interfaces file?

RobinBeismann commented Nov 9, 2024

@NRGnet do you think we should move the loopbacks back to the main interfaces file?

Did you or @NRGnet test that out yet? I'm right now building my new homelab with 3x MS-01 and am wondering whether I should do or not.

au-squirrel commented Dec 5, 2024 •

edited

Loading

So the change to the subnet addresses, is that done in the /etc/network/interfaces.d/thunderbolt file as detailed in the Enable Dual Stack (IPv4 and IPv6) OpenFabric Routing, https://gist.github.com/scyto/4c664734535da122f4ab2951b22b2085 ?
I am wondering if 8.3 has changed a couple of things as I also don't see the lo:0 and lo:6 networks in the GUI.

---Update---
The only way I could get the lo:0 and lo:6 to appear was to include the following in the /etc/networking/interfaces

iface lo inet loopback

auto lo:0
iface lo:0 inet static
address 10.0.0.82/27

auto lo:6
iface lo:6 inet static
address fc00::82/64

auto en05
iface en05 inet manual
#do not edit it GUI

auto en06
iface en06 inet manual
#do not edit in GUI

iface enp87s0 inet manual
--- Snip ---
I have left the lo interfaces as given in the Enable Dual Stack in the /etc/network/interfaces.d/thunderbolt and updated the netmask. When modified in the thunderbolt file, the networks did not appear in the GUI and I was unable to change the cluster migration options to select the thunderbolt network.
Non /64 sub-net masks don't comply with the 64 bit interface address defined in RFC4291 and it makes my IPV6 OCD twitch :-)

pwiegel commented Feb 12, 2025

Note for people (like me) who are not very familiar with the networking aspect:
If you use IP addresses for your thunderbolt/mesh network that are different from the examples given in this documentation and you're getting "no IP address" errors when trying to migrate VMs, you may need to recalculate the CIDR subnet mask. This documentation uses 10.0.0.81-83, but I wanted to use .31-33 (to match the external IPv4 addresses of the hosts), but when I configured datacenter.cfg to use 10.0.0.30/29, I got the error "proxmox could not get migration ip: no IP address configured on local node for network" when trying to migrate VMs.

After a bunch of digging, I realized that *.30/29 gives a usable IP address range of .25-.30... excluding the addresses I was using. So I tweaked the values in an IP Subnet Calculator until I landed on *.30/26, which gives a usable range of 10.0.0.1-10.0.0.62, which includes all of the IP addresses I'm using. Migrations are now happening fast and flawlessly.

ronindesign commented Apr 24, 2025 •

edited

Loading

Create Cluster
You should have 3 browser tabs open for this, one for each node's management IP.

setup on node 1
navigate to Datacenter > pve1 > Cluster and click Create Cluster

It would appear the "Cluster" section has moved to Datacenter > Cluster? (PVE v8.4.1)

Another note: when joining a node to the cluster, it seems the self-signed SSL cert of joining nodes changes (maybe after cluster service restart?) The join process seems to error/hang, all that is required is to refresh the webpage, accept the new self-signed cert, and log in once again. Error encountered:

/etc/pve/nodes/pve2/pve-ssl.pem' does not exist! (500)

Author

scyto commented Apr 25, 2025 •

edited

Loading

@ronindesign thanks, bugger, i need to update pictutes too! :-p

great spot on that change, i haven't used the ceate UI in 2 years! would you still say the join needs to be done from each nodes UI?
(i didn't set up certs until way after I had the cluster running, good to know about the joing seems to change something).

ronindesign commented Apr 25, 2025

👍 haha, no worries!

Yeah, I'm using UI for documentation, we're pushing for IoC but not there yet. Yes, definitely need to join from each node, on their own individual WebUI. Until joined, they won't show up under the initial node's Datacenter, so no other way to access until their joined.

scyto/proxmox-cluster.md

Proxmox cluster Setup

Network Design