Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save Drallas/96fa494b84af7e30b68e1dc0d177812f to your computer and use it in GitHub Desktop.
Save Drallas/96fa494b84af7e30b68e1dc0d177812f to your computer and use it in GitHub Desktop.

Build a Hyper-converged Proxmox HA Cluster with Ceph

Part of collection: Hyper-converged Homelab with Proxmox

This is part 1 focussing on the networking part of building a Proxmox High Available cluster with Ceph.

Part 2 focusses on building the Proxmox Cluster and setting up Ceph itself, and part 3 focussing on Managing and Troubleshooting Proxmox and Ceph.

Why

For some time, I was looking for options to build a Hyper-converged (HCI) Homelab, considering options like TrueNAS Scale, SUSE Harvester) among other option.

But then I discovered that I could build a Hyper-converged Infrastructure with Proxmox, the virtualization software that I was already using.


What

What this setup is creating, is a highly-available (resilient to physical failures, cable disconnects, etc.) and high speed Full Mesh IPv6 only communications channel between the Proxmox/Ceph and Nodes. This network is separated from the management network and only Ceph and internal cluster traffic flows over it.

Diagram Environment

Topology Credit Diagram: John W Kerns (Packet Pushers)

For more background details on this setup see this excellent guide: Proxmox/Ceph – Full Mesh HCI Cluster w/ Dynamic Routing, on which I based my implementation.

Important Information

The official documentation is limited, but fortunately the previously mentioned guide (Proxmox/Ceph – Full Mesh HCI Cluster w/ Dynamic Routing) helped me to get this up and running with confidence that I could replicate the setup in case of a rebuild.

Assumptions

This guide starts with the following assumptions

  • 3x servers for the HCI cluster
  • Fresh installation of Proxmox 8.0 or later
  • The “No-Subscription” or other update repository is set up on each server
  • Servers have been updated with the latest patches using apt or apt-get
  • Servers are connected to a management network, and you have access to the Proxmox GUI as well as root SSH access
  • Cluster links between nodes are connected in a ring topology as shown in the diagram
  • Proxmox cluster has not yet been configured
  • Ceph has not been installed or configured
  • Hosts have no VMs or there are no VMs currently running (you have the ability to perform reboots, etc.)

NOTE: All commands are being run as root!

Generic Values

  • Number of cluster links on each node: 2
  • Cluster Name: Homelab
  • Some of my interfaces have generic Ethernet device names eno1. But I also use 2.5 Gbit USB-C to Ethernet Dongles for the second mesh Ethernet link. That's why I labelled all the cables, to make sure they are plugged into the correct server in case I would take them (all) out)!

Node #1:

  • Name: pve01
  • FQDN: pve01.example.com
  • Proxmox Management IP address: 192.168.1.11
  • Cluster IPv6 address: fc00::1

Node #2:

  • Name: pve02
  • FQDN: pve02.example.com
  • Proxmox Management IP address: 192.168.1.12
  • Cluster IPv6 address: fc00::2

Node #3:

  • Name: pve03
  • FQDN: pve03.example.com
  • Proxmox Management IP address: 192.168.1.13
  • Cluster IPv6 address: fc00::3

Notes:

  • The fc00:: IPv6 addresses are Unique Local Addresses and don’t necessarily have to be replaced with something else unless you are using, or plan to use, those addresses elsewhere in your environment.
  • I actually have four nodes, but for simplicity I base this guide on 3 nodes.
  • This is also possible over Thunderbolt 3 / 4 Interfaces for speeds of 10 Gbit or even higher. Should have know this before, but unfortunately I did find this Gist from Scyto too late.

Setup

Let get to work and get this set-up!

Turn on Links

The first step is to figure out which are the 2.5 Gbit devices that are going to be used for the routing mesh and turn them on. For that, run lldpctl and look for the adapters (MAU oper type:), in my situation it is 2p5GigT but when using 1 or 10 Gbit it might display something like 1GigT or 10GigT.

Install LLDP

  • Logged into SSH as root, install the LLDP daemon with apt install lldpd -y
  • Once complete, run lldpctl to see your neighbour nodes over the cluster interfaces
    • This ensures your links are up and connected in the way you want
    • You should see something like the below

The lldpctl output looks like:

-------------------------------------------------------------------------------
LLDP neighbors:
-------------------------------------------------------------------------------
Interface:    eno1, via: LLDP, RID: 2, Time: 0 day, 00:00:05
  Chassis:
    ChassisID:    mac 54:b2:03:fd:44:e7
    SysName:      pve04.soholocal.nl
    SysDescr:     Debian GNU/Linux 12 (bookworm) Linux 6.2.16-10-pve #1 SMP PREEMPT_DYNAMIC PMX 6.2.16-10 (2023-08-18T11:42Z) x86_64
    MgmtIP:       192.168.1.14
    MgmtIface:    5
    MgmtIP:       fe80::a2ce:c8ff:fe9b:c0b9
    MgmtIface:    5
    Capability:   Bridge, on
    Capability:   Router, off
    Capability:   Wlan, off
    Capability:   Station, off
  Port:
    PortID:       mac 00:e0:4c:68:00:59
    PortDescr:    enx00e04c680059
    TTL:          120
    PMD autoneg:  supported: no, enabled: no
      MAU oper type: 2p5GigT - 2.5GBASE-T Four-pair twisted-pair balanced copper cabling PHY

Once you know which interfaces devices are part of the mesh:

  • On each node, log into the Proxmox GUI and navigate to System > Network
  • Edit the interfaces and make sure the “Autostart” checkbox is checked
  • Hit the Apply Configuration on the network page if you had to make any changes

My Network configuration looked like this:

pve01

Mesh Devices: eno1 and enx00e04c680048 Screenshot 2023-09-27 at 10 28 35

pve02

Mesh Devices: eno1 and enx00e04c680001 Screenshot 2023-09-27 at 10 28 40

pve03

Mesh Devices: eno1 and enx00e04c680101 Screenshot 2023-09-27 at 10 28 50

If everything is setup correctly, each node should display it's 2 neighbour nodes via lldpctl.

Create Loopbacks

  • On each node, using the SSH terminal, edit the interfaces file: nano /etc/network/interfaces
  • Add the below interface definition
    • The ::1 number should represent your node number
    • Change to ::2 for node 2, ::3 for node 3, etc.
    • This will be the unique IP address of this node for Ceph and Proxmox cluster services

On each node, using the SSH terminal, edit the interfaces file nano /etc/network/interfaces.

Add the code below tho each node, only change the fc00::1/128 value on each node.

auto lo:0
iface lo:0 inet static
        address fc00::1/128

The section in /etc/network/interfaces should now look like below: Screenshot 2023-09-27 at 15 50 45

  • Save and close the file.
  • Restart network services to apply the changes: systemctl restart networking.service && systemctl status networking.service.

Enable IPv6 Forwarding

When the cluster ring is broken, some nodes will need to communicate with each other by routing through a neighbour node. To allow this to happen, we need to enable IPv6 forwarding in the Linux kernel.

On each node, edit the sysctl file: nano /etc/sysctl.conf

Uncomment the line:

#net.ipv6.conf.all.forwarding=1

To make it look like

net.ipv6.conf.all.forwarding=1

  • Save and close the file
  • Set the live IPv6 forwarding state with: sysctl net.ipv6.conf.all.forwarding=1
  • Check that the Linux kernel is set to forward IPv6 packets: sysctl net.ipv6.conf.all.forwarding
  • Output should be: net.ipv6.conf.all.forwarding = 1

Set Up Free Range Routing (FRR) OSPF

  • On each node, install FRR with: apt install frr
  • Edit the FRR config file: nano /etc/frr/daemons
  • Adjust ospf6d=no to ospf6d=yes and save the file
  • Restart FRR: systemctl restart frr.service

Configuration

  • On each node: enter the FRR shell: vtysh
  • Check the current config: show running-config
  • Enter config mode: configure
  • Apply the below configuration
    • The 0.0.0.1 number should represent your node number
    • Change to 0.0.0.2 for node 2, 0.0.0.3 for node 3, etc
router ospf6
 ospf6 router-id 0.0.0.1
 log-adjacency-changes
 exit
!
interface lo
 ipv6 ospf6 area 0
 exit
!
interface ens3f0
 ipv6 ospf6 area 0
 ipv6 ospf6 network point-to-point
 exit
!
interface ens3f1
 ipv6 ospf6 area 0
 ipv6 ospf6 network point-to-point
 exit
!
  • Exit the config mode: end
    • Hit enter and then type end one line below the exclamation mark!
  • Save the config: write memory
  • type exit to leave vtysh.

Verification

Once you did this on all Nodes, check for the OSPF6 neighbours:

  • vtysh -c 'show ipv6 ospf6 neighbor' from the Bash / ZSH terminal mode, or - show ipv6 ospf6 neighbor still in vtysh. Mode.

You should see two neighbours on each node:

pve01# show ipv6 ospf6 neighbor
Neighbor ID     Pri    DeadTime    State/IfState         Duration I/F[State]
0.0.0.4           1    00:00:33     Full/PointToPoint  1d15:14:19 eno1[PointToPoint]
0.0.0.2           1    00:00:32     Full/PointToPoint  1d15:14:39 enx00e04c680048[PointToPoint]
  • Show the IPv6 routing table in FRR: vtysh -c 'show ipv6 route'
    • You should see the IPv6 addresses of your neighbours in the table as OSPF routes

Screenshot 2023-09-27 at 15 36 40

  • Show the IPv6 routing table in Linux: ip -6 route

Screenshot 2023-09-27 at 15 34 32

Note: This example is my node pve01 and the direct Neighbors are pve02 fc00::2/128 and pve04 fc00::4/128 are shown; but because I have 4 node setup, the node pve03 fc00::3/128 is also seen.

  • Check that the Linux kernel is set to forward IPv6 packets: sysctl net.ipv6.conf.all.forwarding
    • Output should be: net.ipv6.conf.all.forwarding = 1

Test OSPF Routing & Redundancy

We should now have full reachability between our loopback interfaces. Let’s test it.

  • Ping every node from every node: ping fc00::1
    • Replace the IP to make sure you have reachability between all nodes
  • Check your neighbours are all up: vtysh -c 'show ipv6 ospf6 neighbor'
  • Pick one of your nodes and shut down one of your 10G links: ip link set eno1 down
    • Or you can pull out a cable if you prefer the real-world test
    • DO NOT do this on all nodes, just on one
  • Check that the link is down: ip link
  • Check your neighbours, should only have one on this node: vtysh -c 'show ipv6 ospf6 neighbor'
  • Ping every node from every node AGAIN: ping fc00::1
    • This should still work, you will route through one of your nodes to reach the detached one
  • Check your routing table: ip -6 route
    • You will see the links used reflect the routing path
  • Bring the downed link back up: ip link set eno1 up
    • Or plug the cable back in
    • The routing table should change back after approx 15 seconds
  • Ping every node from every node ONE LAST TIME: ping fc00::1
    • Make sure the system is working properly

Update the Hosts File

  • Edit the hosts file: nano /etc/hosts (or Use the Proxmox GUI)
  • Add the below lines to the file:
fc00::1 pve01.soholocal.nl pve01
fc00::2 pve02.soholocal.nl pve02
fc00::3 pve03.soholocal.nl pve03
fc00::4 pve04.soholocal.nl pve04
  • Ping each host by name and make sure the IPv6 address is used.
  • Reboot each server.
  • Once back online, ping each host by name again and make sure the IPv6 address is used.
  • Perform all the routing and redundancy tests above again to make sure a reboot does not break anything.

If everything checks out, your system and you are ready for Write-up: Build a Hyper-converged Proxmox Cluster with Ceph (part 2).

Issue

Network interface enx00e04c680059 Down

Only on my Intel NUC the enx00e04c680059 Network interface is chrasing once a day:

r8152 2-3:1.0 enx00e04c680059: Tx status -108

Not sure why yet, as a workaround I created a network-check.sh script that runs every 15 minutes and brings it up again if it's down.

  • Create a Crontab entry: crontab -e and add the line below.

*/15 * * * * /root/scripts/network-check/network-check.sh > /dev/null 2>&1


#!/bin/bash
# Define the network interface name
interface="enx00e04c680059"
# Log directory
log_dir="/root/scripts/network-check/network_logs"
# Ensure the log directory exists
mkdir -p "$log_dir"
# Log file with today's date
LOG_FILE="$log_dir/network_check_$(date +"%Y-%m-%d").log"
# Function to perform log rotation
rotate_logs() {
find "$log_dir" -name "network_check_*.log" -mtime +7 -exec rm {} \;
}
# Check if the network interface is up
if ip link show dev "$interface" | grep -q "UP,LOWER_UP"; then
echo "Network interface $interface is already up."
#echo "$(date) - Network interface $interface is already up." >> "$LOG_FILE"
else
# Bring the network interface up
ip link set dev "$interface" up
if [ $? -eq 0 ]; then
echo "Network interface $interface has been brought up successfully."
echo "$(date) - Network interface $interface has been brought up successfully." >> "$LOG_FILE"
sleep 15
# Run vtysh command and log the output
vtysh_output=$(vtysh -c 'show ipv6 ospf6 neighbor')
echo "vtysh output:" >> "$LOG_FILE"
echo "$vtysh_output" >> "$LOG_FILE"
else
echo "Failed to bring up network interface $interface."
echo "$(date) - Failed to bring up network interface $interface." >> "$LOG_FILE"
fi
fi
# Perform log rotation
rotate_logs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment