Note this should only be done once you are sure you have reliable TB mesh network.
this is because proxmox UI seems fragile wrt to changing underlying network after configuration of ceph.
All installation done via command line due to gui not understanding the mesh network
This setup doesn't attempt to seperate the ceph public network and ceph cluster network (not same as proxmox clutser network), The goal is to get an easy working setup.
**2025.04.24 NOTE: some folks had to switch to IPv6 for ceph due to IPv4 unreliability issues, we think as of pve 8.4.1 and all the input the community has give to update this set of gsists - that IPv4 is now reliable even on MS-01. As such i advising everyone to use IPv4 for ceph as if you have IPv6 you will have issues with SDN at this time (if you don't use SDN this is not an issue).
this gist is part of this series
- On all nodes execute the command
pveceph install --repository no-subscription
accept all the packages and install - On node 1 execute the command
pveceph init --network 10.0.0.81/24
- On node 1 execute the command
pveceph mon create --mon-address 10.0.0.81
- On node 2 execute the command
pveceph mon create --mon-address 10.0.0.82
- On node 3 execute the command
pveceph mon create --mon-address 10.0.0.83
Now if you access the gui Datacenter > pve1 > ceph > monitor
you should have 3 running monitors (ignore any errors on the root ceph UI leaf for now).
If so you can proceed to next step. If not you probably have something wrong in your network, check all settings.
- On any node go to
Datacenter > nodename > ceph > monitor
and clickcreate
manager in the manager section. - Selecty an node that doesn't have a manager from the drop dwon and click
create
3 repeat step 2 as needed If this fails it probably means your networking is not working
- On any node go to
Datacenter > nodename > ceph > OSD
- click
create OSD
select all the defaults (again this for a simple setup) - repeat untill you have 3 nodes like this (note it can take 30 seconds for a new OSD to go green)
If you find there are no availale disks when you try to add it probably means your dedicated nvme/ssd has some other filesystem or old osd on it. To wipe the disk use the following UI. Becareful not to wipe your OS disk.
- On any node go to
Datacenter > nodename > ceph > pools
and clickcreate
- name the volume, e.g.
vm-disks
and leave defaults as is and clickcreate
- On any node go to
Datacenter > options
- Set
Cluster Resource Scheduling
toha-rebalance-on-start=1
(this will rebalance nodes as needed) - Set
HA Settings
toshutdown_policy=migrate
(this will migrate VMs and CTs if you gracefully shutdown a node). - Set
migration settings
leave as default (seperate gist will talk about seperating migration network later)
this is my blind attempt at ensuring ceph doesn't try and start until frr service is up - i don't have any tests in the startup to make sure the interfaces are up so it may not make too much diff to MS-01 users. but anyhoo here it is...
edit /usr/lib/systemd/system/ceph.target
to look like this
[Unit]
Description=ceph target allowing to start/stop all ceph*@.service instances at once
After=frr.service
Requires=frr.service
[Install]
WantedBy=multi-user.target
note: i need to revise this as this file could be overwritten on upgrade)
- make a directory
mkdir /etc/systemd/system/pvestatd.service.d
- create a file in
nano /etc/systemd/system/pvestatd.service.d/dependencies.conf
- add to file the following
[Unit]
After=pve-storage.target
- save
(note i am ucnlear if this currently works despite this being the recommended answer)
Thank you for the gist! I'ts worked fantastically for me so far.
Pardon me if this is thick, but I have two questions before I proceed with this step (setting up ceph and HA).
As some others have reported, I cannot ping the mesh network's IPv4 addresses unless I
systemctl restart frr
on each node after startup. That said, am I correctly interpreting your earlier guidanceif I just replace the IPv4 addresses with their IPv6 counterparts in the instructions in this part of the gist?
Additionally, I would very much like to add a fourth node to the cluster to serve as a dedicated router/reverse proxy/networking tools stack.
Is this possible/ does this pose any issues if so? I have never worked with HA and am getting a little stuck what to make of the settings we put in for the migration network, and what bearing they would have on a potential fourth node.
Currently I am using
migration: insecure,network=fc00::81/125
in mydatacenter.cfg
and everything is working as expected.I did see one previous poster mention a fourth node but could not gather whether any special configuration changes are required to add one to this setup.
Thank you!