Skip to content

Instantly share code, notes, and snippets.

@scyto
Last active November 9, 2024 22:06
Show Gist options
  • Save scyto/645193291b9a81eb3cb6ebefe68274ae to your computer and use it in GitHub Desktop.
Save scyto/645193291b9a81eb3cb6ebefe68274ae to your computer and use it in GitHub Desktop.
Proxmox cluster Setup

Proxmox cluster Setup

this gist is part of this series

Network Design

Put simply I am not sure what the design should be. I have the thunderbolt mesh network and the 2.5gbe NIC on each node. The ideal design guidelies cause my brain to have a race conditions because:

  1. ceph shold have a dedicated network
  2. proxmox should not have migration traffic and cluster communications network
  3. one wants cluster communicationsnetwork reddundant

I have 3 networks:

  1. Onboard 2.5gb NIC connected to one switch for subnet 192.168.1.0/24 (my LAN)

  2. Thunderbolt mesh connected in a ring for subnet 10.0.0.80/28

    • this has 3 subnets 10.0.0.81/32, 10.0.0.82/32 and 10.0.0.83/32 these are used for OSPF routing between nodes
  3. Addtional 2.5Gbe using (NUCIOALUWS) add-on afor subnet TBD

    • cluster (aka corosync) network uses network 1 (2.5gbe)
    • ceph migration traffic uses network 2 (thunderbolt)
    • ceph public network uses network 2 (thunderbolt
    • CT and VM migration traffic uses network 2 (thunderbolt network)

I have not yet decided what network 3 will be used for, options are:

  • cluster public network that other devices use to access the cluster or its resources
  • backup corosync (though i don't see a reason not to have corosync on all 3 networks)
  • ceph public network - but I assume this is what the VMs uses so it makes sense to i want that on the 26Gbps thunderbolt mesh too

Create Cluster

You should have 3 browser tabs open for this, one for each node's management IP.

setup on node 1

  1. navigate to Datacenter > pve1 > Cluster and click Create Cluster
  2. name the cluster e.g. pve-cluster1
  3. set link 0 to the IPv4 address (in my case 192.168.1.81 on interface vmbr0)
  4. click create

Join node 2

  1. on node 1 in Datacenter > pve1 > Cluster click join information
  2. the IP address should be node 1 IPv4 address
  3. click copy information
  4. open tab 2 in your browser to node 2 management page
  5. navingate to Datacenter > pve2 > Cluster and click join cluster
  6. paste the information into the dialog box that you collected in step 3
  7. Fill the root password in of node 1
  8. Select Link 0 as 192.168.1.82
  9. click button join 'pve-cluster1'

Join node 3

  1. on node 1 in Datacenter > pve1 > Cluster click join information
  2. the IP address should be node 1 IPv4 address
  3. click copy information
  4. open tab 2 in your browser to node 3 management page
  5. navingate to Datacenter > pve3 > Cluster and click join cluster
  6. paste the information into the dialog box that you collected in step 3
  7. Fill the root password in of node 1
  8. Select Link 0 as 192.168.1.83
  9. click button join 'pve-cluster1'

at this point close your pv2 and pve 3 tabs - you can now manage all 3 cluster nodes from node 1 (or any node)

Define Migration Network

  1. navigate in webui to Datacenter > Options
  2. double click Migration Settings
  3. select network 10.0.0.81/32 and click ok
  4. edit with nano /etc/pve/datacenter.cfg and change: migration: network=10.0.0.81/32,type=secure to migration: network=10.0.0.80/29,type=insecure This is a)this subnet contains 192.168.1.81/32, 192.168.1.82/32 and 192.168.1.83/32; and b) because it is 100% isolated network it can be insecure give a small speed boost

Configuring for High Availability

  1. navigate in webui to Datacenter > HA > Groups
  2. click create
  3. Name the cluster (ID) ClusterGroup1
  4. add all 3 nodes and then click create
@DerpJim
Copy link

DerpJim commented Sep 13, 2024

When following the updated guide to create a separate folder for the thunderbolt networking "/etc/network/interfaces.d/thunderbolt" the interfaces lo:0 and lo:6 don't show up in the GUI to allow for setting the migration network. I manually edited the /etc/pve/datacenter.cfg with "migration: network=fc00::81/128,type=insecure" and it shows in the GUI as changed but I haven't tested migration yet. Not sure if this is correct or if I messed something up which isn't letting me select the network via GUI.

Just tested a migration and it fails. "could not get migration ip: no IP address configured on local node for network 'fc00::81/128'"

@NRGnet
Copy link

NRGnet commented Sep 13, 2024

When following the updated guide to create a separate folder for the thunderbolt networking "/etc/network/interfaces.d/thunderbolt" the interfaces lo:0 and lo:6 don't show up in the GUI to allow for setting the migration network. I manually edited the /etc/pve/datacenter.cfg with "migration: network=fc00::81/128,type=insecure" and it shows in the GUI as changed but I haven't tested migration yet. Not sure if this is correct or if I messed something up which isn't letting me select the network via GUI.

Just tested a migration and it fails. "could not get migration ip: no IP address configured on local node for network 'fc00::81/128'"

@DerpJim Since the loopback interface was moved to allow using GUI to edit network settings it wont show up and will need to manually be set. That being said it looks like you figured that out just did not change the subnet of /128 only has 1 IP, you need to change to a subnet that will have all IP's of the cluster in it. That is why for IPv4 he says to change /32 to /29. I am not the greatest when it comes to IPv6 but I think /125 should work.

@DerpJim
Copy link

DerpJim commented Sep 13, 2024

When following the updated guide to create a separate folder for the thunderbolt networking "/etc/network/interfaces.d/thunderbolt" the interfaces lo:0 and lo:6 don't show up in the GUI to allow for setting the migration network. I manually edited the /etc/pve/datacenter.cfg with "migration: network=fc00::81/128,type=insecure" and it shows in the GUI as changed but I haven't tested migration yet. Not sure if this is correct or if I messed something up which isn't letting me select the network via GUI.

Just tested a migration and it fails. "could not get migration ip: no IP address configured on local node for network 'fc00::81/128'"

@DerpJim Since the loopback interface was moved to allow using GUI to edit network settings it wont show up and will need to manually be set. That being said it looks like you figured that out just did not change the subnet of /128 only has 1 IP, you need to change to a subnet that will have all IP's of the cluster in it. That is why for IPv4 he says to change /32 to /29. I am not the greatest when it comes to IPv6 but I think /125 should work.

@NRGnet Thank you! I definitely missed that note. /125 worked!!! Thank you!

@scyto
Copy link
Author

scyto commented Sep 16, 2024

@NRGnet do you think we should move the loopbacks back to the main interfaces file?

@RobinBeismann
Copy link

@NRGnet do you think we should move the loopbacks back to the main interfaces file?

Did you or @NRGnet test that out yet? I'm right now building my new homelab with 3x MS-01 and am wondering whether I should do or not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment