aka what i did to get from nothing to done.
note: these are designed to be primarily a re-install guide for myself (writing things down helps me memorize the knowledge), as such don't take any of this on blind faith - some areas are well tested and the docs are very robust, some items, less so). YMMV
Purpose of Proxmox cluster project
Required Outomces of cluster project
Updates as of 2025.04.20 Been running great, still had issues with IPv4 dual fabric. Have refactored that with some great suggestions from commenters. Now need to see if longhaul tests prove out if these have helped.
Also made ceph take a hard dependecy on frr service being started - this may help some scenarios, but not if the thunderbolt interfaces are down, still not sure how to help folks there (this applies mostly to MS-101 users, see commnts sections of indidual gists, esp old deprecated openfrabric mesh gist)
-
v2 - Enable Dual Stack IPv4 / IPv4) Openfabric Routing Mesh
Enable Dual Stack (IPv4 and IPv6) Openfabric Routing on Mesh Networkdeprecated - Old gist here
-
Migrate my debian VM based docker swarm from Hyper-V to proxmox
-
Extra Credit (optional):
- add TLS to the mail relay? with LE certs? maybe?
- maybe send syslog to my syslog server (securely)
- figure out ceph public/cluster running on different networks - unclear its needed for this size of install
- get all nodes listening to my network UPS and shut down before power runs out
- using one of these three ceph volume plugins Brindster/docker-plugin-cephfs flaviostutz/cepher n0r1sk/docker-volume-cephfs each has different strengths and weaknesses (i will like choose either the n0r1sk or the Brindster one) - until i figure out ceph networking more this is dead in the water as ceph isn't reachable from LAN or docker swarm VMs - so using virtiofs linked in main items above.
I have been using Hyper-V for my docker swarm cluster VM hosts (see other gists). Original intenttion was to try and get Thunderbolt Networking for a Hyper-V cluster going and clustered storage for the VMs. This turns out to be super hard when using NUCs as cluster nodes due to too few disks. I looked at solar winds as alternative but this was both complex and not pervasive.
I had been watching proxmox for years and thought now was a good time to jump in and see what it is all about. (i had never booted or looked at proxmox UI before doing this - so this documentation is soup to nuts and intended for me to repro if needed)
- VMs running on clustered storage {completed}
- Use of ThunderBolt for ~26Gbe Cluster VM operations (replication, failover etc)
- Thunderbolt meshs with OSPF routing {completed}
- Ceph over thunderbolt mesh {completed}
- VM running with live migration {completed}
- VM running with HA failove of node failure {completed}
- Seperate VM/CT Migration network over thunderbolt mesh {not started}
- Use low powered off the shelf Intel NUCs {completed}
- Migrate VMs from Hyper-V:
- Windows Server Domain Controler / DNS / DHCP / CA / AAD SYNC VMs {not started}
- Debian Dcoker Host (for my 3 running 3 node swarm) VMs {not started}
- HomeAssistant VM {not started}
- Sized to last me 5+ years (lol, yeah, right)
- 3x 13th Gen Intel NUCs (NUC13ANHi7):
- Core i7-1360P Processor(12 Cores, 5.0 GHz, 16 Threads)
- Intel Iris Xe Graphics
- 64 GB DDR4 3200 CL22 RAM
- Samsung 870 EVO SSD 1TB Boot Drive
- Samsung 980 Pro NVME 2 TB Data Drive
- 1x Onboard 2.5Gbe LAN Port
- 2x Onboard Thunderbolt4 Ports
- 1 x 2.5Gbe usinng Intel NUCIOALUWS nvme epxansion port
- 3 x OWC TB4 Cables
- Proxmox v8.x
- Ceph (included with Proxmox)
- LLDP (included with Proxmox)
- Free Range Routing - FRR OSPF - (included with Proxmox)
- nano ;-)
Proxmox/Ceph Guide from packet pushers
Proxmox Forum - several community members were invaluable in providing me a breadcrumb trail.
The Eastside is the best side! I'm happy to buy you beer in Bellevue (Tap House), Redmond (Matt's Rotisserie), Kirkland (Cactus).. maybe I'll PM on Pmox forums (where I've been spying on your posts from the past - as an evil lurker) - since there doesn't seem to be a PM feature here?
Back to my issue(s)..
I noticed that I have to pull the usb thumb drive out and re-insert it while installing Proxmox after the boot (right after post). If I don't it'll (pmox) starts a count down timer for reboot because it can't find the install drive/folders. Like the USB port are insta-sleeping?
I wonder if that is happening with my USB4 ports too? Maybe pmox's kernel isn't seeing my ports - so it's not trying to forward packets though them?
udevadm monitor shows nothing pulling a cable out or re-inserting..
FYI I'm using your NUC13 guide on my Minisforum NPB5 ( https://www.amazon.com/MINISFORUM-NPB5-i5-13500H-Computer-PCIe4-0/dp/B0C9M5NTPX ) so everything might be buggered? I know not what is "Port 1" or "Port 2" on my silly PC. I have right and left - depending on the side your looking at it! I wouldn't think that makes to much of an issue however - since the plugs and cables are routed in a ring typology either way - no?
Thanks for reading - if you did!
Cheers!