Part of collection: Hyper-converged Homelab with Proxmox
This is part 1 focussing on the networking part of building a Proxmox High Available cluster with Ceph.
Part 2 focusses on building the Proxmox Cluster and setting up Ceph itself, and part 3 focussing on Managing and Troubleshooting Proxmox and Ceph.
For some time, I was looking for options to build a Hyper-converged (HCI) Homelab, considering options like TrueNAS Scale, SUSE Harvester) among other option.
But then I discovered that I could build a Hyper-converged Infrastructure with Proxmox, the virtualization software that I was already using.
What this setup is creating, is a highly-available (resilient to physical failures, cable disconnects, etc.) and high speed Full Mesh IPv6 only communications channel between the Proxmox/Ceph and Nodes. This network is separated from the management network and only Ceph and internal cluster traffic flows over it.
Credit Diagram: John W Kerns (Packet Pushers)
For more background details on this setup see this excellent guide: Proxmox/Ceph – Full Mesh HCI Cluster w/ Dynamic Routing, on which I based my implementation.
The official documentation is limited, but fortunately the previously mentioned guide (Proxmox/Ceph – Full Mesh HCI Cluster w/ Dynamic Routing) helped me to get this up and running with confidence that I could replicate the setup in case of a rebuild.
This guide starts with the following assumptions
- 3x servers for the HCI cluster
- Fresh installation of Proxmox 8.0 or later
- The “No-Subscription” or other update repository is set up on each server
- Servers have been updated with the latest patches using apt or apt-get
- Servers are connected to a management network, and you have access to the Proxmox GUI as well as root SSH access
- Cluster links between nodes are connected in a ring topology as shown in the diagram
- Proxmox cluster has not yet been configured
- Ceph has not been installed or configured
- Hosts have no VMs or there are no VMs currently running (you have the ability to perform reboots, etc.)
NOTE: All commands are being run as root!
- Number of cluster links on each node: 2
- Cluster Name: Homelab
- Some of my interfaces have generic Ethernet device names eno1. But I also use 2.5 Gbit USB-C to Ethernet Dongles for the second mesh Ethernet link. That's why I labelled all the cables, to make sure they are plugged into the correct server in case I would take them (all) out)!
Node #1:
- Name: pve01
- FQDN: pve01.example.com
- Proxmox Management IP address: 192.168.1.11
- Cluster IPv6 address: fc00::1
Node #2:
- Name: pve02
- FQDN: pve02.example.com
- Proxmox Management IP address: 192.168.1.12
- Cluster IPv6 address: fc00::2
Node #3:
- Name: pve03
- FQDN: pve03.example.com
- Proxmox Management IP address: 192.168.1.13
- Cluster IPv6 address: fc00::3
Notes:
- The fc00:: IPv6 addresses are Unique Local Addresses and don’t necessarily have to be replaced with something else unless you are using, or plan to use, those addresses elsewhere in your environment.
- I actually have four nodes, but for simplicity I base this guide on 3 nodes.
- This is also possible over Thunderbolt 3 / 4 Interfaces for speeds of 10 Gbit or even higher. Should have know this before, but unfortunately I did find this Gist from Scyto too late.
Let get to work and get this set-up!
The first step is to figure out which are the 2.5 Gbit devices that are going to be used for the routing mesh and turn them on. For that, run lldpctl and look for the adapters (MAU oper type:), in my situation it is 2p5GigT but when using 1 or 10 Gbit it might display something like 1GigT or 10GigT.
- Logged into SSH as root, install the LLDP daemon with
apt install lldpd -y
- Once complete, run
lldpctl
to see your neighbour nodes over the cluster interfaces- This ensures your links are up and connected in the way you want
- You should see something like the below
The lldpctl output looks like:
-------------------------------------------------------------------------------
LLDP neighbors:
-------------------------------------------------------------------------------
Interface: eno1, via: LLDP, RID: 2, Time: 0 day, 00:00:05
Chassis:
ChassisID: mac 54:b2:03:fd:44:e7
SysName: pve04.soholocal.nl
SysDescr: Debian GNU/Linux 12 (bookworm) Linux 6.2.16-10-pve #1 SMP PREEMPT_DYNAMIC PMX 6.2.16-10 (2023-08-18T11:42Z) x86_64
MgmtIP: 192.168.1.14
MgmtIface: 5
MgmtIP: fe80::a2ce:c8ff:fe9b:c0b9
MgmtIface: 5
Capability: Bridge, on
Capability: Router, off
Capability: Wlan, off
Capability: Station, off
Port:
PortID: mac 00:e0:4c:68:00:59
PortDescr: enx00e04c680059
TTL: 120
PMD autoneg: supported: no, enabled: no
MAU oper type: 2p5GigT - 2.5GBASE-T Four-pair twisted-pair balanced copper cabling PHY
Once you know which interfaces devices are part of the mesh:
- On each node, log into the Proxmox GUI and navigate to System > Network
- Edit the interfaces and make sure the “Autostart” checkbox is checked
- Hit the Apply Configuration on the network page if you had to make any changes
My Network configuration looked like this:
pve01
Mesh Devices: eno1 and enx00e04c680048
pve02
Mesh Devices: eno1 and enx00e04c680001
pve03
Mesh Devices: eno1 and enx00e04c680101
If everything is setup correctly, each node should display it's 2 neighbour nodes via lldpctl
.
- On each node, using the SSH terminal, edit the interfaces file: nano /etc/network/interfaces
- Add the below interface definition
- The ::1 number should represent your node number
- Change to ::2 for node 2, ::3 for node 3, etc.
- This will be the unique IP address of this node for Ceph and Proxmox cluster services
On each node, using the SSH terminal, edit the interfaces file nano /etc/network/interfaces
.
Add the code below tho each node, only change the fc00::1/128 value on each node.
auto lo:0
iface lo:0 inet static
address fc00::1/128
The section in /etc/network/interfaces should now look like below:
- Save and close the file.
- Restart network services to apply the changes:
systemctl restart networking.service && systemctl status networking.service
.
When the cluster ring is broken, some nodes will need to communicate with each other by routing through a neighbour node. To allow this to happen, we need to enable IPv6 forwarding in the Linux kernel.
On each node, edit the sysctl file: nano /etc/sysctl.conf
Uncomment the line:
#net.ipv6.conf.all.forwarding=1
To make it look like
net.ipv6.conf.all.forwarding=1
- Save and close the file
- Set the live IPv6 forwarding state with:
sysctl net.ipv6.conf.all.forwarding=1
- Check that the Linux kernel is set to forward IPv6 packets:
sysctl net.ipv6.conf.all.forwarding
- Output should be:
net.ipv6.conf.all.forwarding = 1
- On each node, install FRR with:
apt install frr
- Edit the FRR config file:
nano /etc/frr/daemons
- Adjust
ospf6d=no
toospf6d=yes
and save the file - Restart FRR:
systemctl restart frr.service
- On each node: enter the FRR shell:
vtysh
- Check the current config:
show running-config
- Enter config mode:
configure
- Apply the below configuration
- The 0.0.0.1 number should represent your node number
- Change to 0.0.0.2 for node 2, 0.0.0.3 for node 3, etc
router ospf6
ospf6 router-id 0.0.0.1
log-adjacency-changes
exit
!
interface lo
ipv6 ospf6 area 0
exit
!
interface ens3f0
ipv6 ospf6 area 0
ipv6 ospf6 network point-to-point
exit
!
interface ens3f1
ipv6 ospf6 area 0
ipv6 ospf6 network point-to-point
exit
!
- Exit the config mode: end
- Hit enter and then type
end
one line below the exclamation mark!
- Hit enter and then type
- Save the config:
write memory
- type
exit
to leave vtysh.
Once you did this on all Nodes, check for the OSPF6 neighbours:
vtysh -c 'show ipv6 ospf6 neighbor'
from the Bash / ZSH terminal mode, or -show ipv6 ospf6 neighbor
still invtysh
. Mode.
You should see two neighbours on each node:
pve01# show ipv6 ospf6 neighbor
Neighbor ID Pri DeadTime State/IfState Duration I/F[State]
0.0.0.4 1 00:00:33 Full/PointToPoint 1d15:14:19 eno1[PointToPoint]
0.0.0.2 1 00:00:32 Full/PointToPoint 1d15:14:39 enx00e04c680048[PointToPoint]
- Show the IPv6 routing table in FRR:
vtysh -c 'show ipv6 route'
- You should see the IPv6 addresses of your neighbours in the table as OSPF routes
- Show the IPv6 routing table in Linux: ip -6 route
Note: This example is my node pve01 and the direct Neighbors are pve02 fc00::2/128 and pve04 fc00::4/128 are shown; but because I have 4 node setup, the node pve03 fc00::3/128 is also seen.
- Check that the Linux kernel is set to forward IPv6 packets:
sysctl net.ipv6.conf.all.forwarding
- Output should be:
net.ipv6.conf.all.forwarding = 1
- Output should be:
We should now have full reachability between our loopback interfaces. Let’s test it.
- Ping every node from every node:
ping fc00::1
- Replace the IP to make sure you have reachability between all nodes
- Check your neighbours are all up:
vtysh -c 'show ipv6 ospf6 neighbor'
- Pick one of your nodes and shut down one of your 10G links:
ip link set eno1 down
- Or you can pull out a cable if you prefer the real-world test
- DO NOT do this on all nodes, just on one
- Check that the link is down:
ip link
- Check your neighbours, should only have one on this node:
vtysh -c 'show ipv6 ospf6 neighbor'
- Ping every node from every node AGAIN:
ping fc00::1
- This should still work, you will route through one of your nodes to reach the detached one
- Check your routing table:
ip -6 route
- You will see the links used reflect the routing path
- Bring the downed link back up:
ip link set eno1 up
- Or plug the cable back in
- The routing table should change back after approx 15 seconds
- Ping every node from every node ONE LAST TIME:
ping fc00::1
- Make sure the system is working properly
- Edit the hosts file: nano /etc/hosts (or Use the Proxmox GUI)
- Add the below lines to the file:
fc00::1 pve01.soholocal.nl pve01
fc00::2 pve02.soholocal.nl pve02
fc00::3 pve03.soholocal.nl pve03
fc00::4 pve04.soholocal.nl pve04
- Ping each host by name and make sure the IPv6 address is used.
- Reboot each server.
- Once back online, ping each host by name again and make sure the IPv6 address is used.
- Perform all the routing and redundancy tests above again to make sure a reboot does not break anything.
If everything checks out, your system and you are ready for Write-up: Build a Hyper-converged Proxmox Cluster with Ceph (part 2).
Only on my Intel NUC the enx00e04c680059
Network interface is chrasing once a day:
r8152 2-3:1.0 enx00e04c680059: Tx status -108
Not sure why yet, as a workaround I created a network-check.sh script that runs every 15 minutes and brings it up again if it's down.
- Create a Crontab entry: crontab -e and add the line below.
*/15 * * * * /root/scripts/network-check/network-check.sh > /dev/null 2>&1