scyto/proxmox-ceph.md

Last active February 26, 2025 12:57

Star (18) You must be signed in to star a gist
Fork (9) You must be signed in to fork a gist

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/scyto/8c652f3eab61ed1fa2f980d02a484c35.js"></script>
Save scyto/8c652f3eab61ed1fa2f980d02a484c35 to your computer and use it in GitHub Desktop.

Download ZIP

setting up the ceph cluster

Raw

proxmox-ceph.md

CEPH HA Setup

Note this should only be done once you are sure you have reliable TB mesh network.

this is because proxmox UI seems fragile wrt to changing underlying network after configuration of ceph.

All installation done via command line due to gui not understanding the mesh network

This setup doesn't attempt to seperate the ceph public network and ceph cluster network (not same as proxmox clutser network), The goal is to get an easy working setup.

this gist is part of this series

Ceph Initial Install & monitor creation

On all nodes execute the command pveceph install --repository no-subscription accept all the packages and install
On node 1 execute the command pveceph init --network 10.0.0.81/24
On node 1 execute the command pveceph mon create --mon-address 10.0.0.81
On node 2 execute the command pveceph mon create --mon-address 10.0.0.82
On node 3 execute the command pveceph mon create --mon-address 10.0.0.83

Now if you access the gui Datacenter > pve1 > ceph > monitor you should have 3 running monitors (ignore any errors on the root ceph UI leaf for now).

If so you can proceed to next step. If not you probably have something wrong in your network, check all settings.

Add Addtional managers

On any node go to Datacenter > nodename > ceph > monitor and click create manager in the manager section.
Selecty an node that doesn't have a manager from the drop dwon and click create 3 repeat step 2 as needed If this fails it probably means your networking is not working

Add OSDs

On any node go to Datacenter > nodename > ceph > OSD
click create OSDselect all the defaults (again this for a simple setup)
repeat untill you have 3 nodes like this (note it can take 30 seconds for a new OSD to go green)

If you find there are no availale disks when you try to add it probably means your dedicated nvme/ssd has some other filesystem or old osd on it. To wipe the disk use the following UI. Becareful not to wipe your OS disk.

Create Pool

On any node go to Datacenter > nodename > ceph > pools and click create
name the volume, e.g. vm-disks and leave defaults as is and click create

Configure HA

On any node go to Datacenter > options
Set Cluster Resource Scheduling to ha-rebalance-on-start=1 (this will rebalance nodes as needed)
Set HA Settings to shutdown_policy=migrate (this will migrate VMs and CTs if you gracefully shutdown a node).
Set migration settings leave as default (seperate gist will talk about seperating migration network later)

jacoburgin commented Jan 30, 2024 •

edited

Loading

I think I'm up to at least 10 wipes 😭😂 keep breaking it on my own 😂

jacoburgin commented Jan 31, 2024

Let me know how it goes! It it works for you maybe I'll consider wiping a 3rd time...

Well I have learnt a lot more about removing cephfs...

But nothing has fixed the random node freezing and subsequently disconnecting.

I fresh installed with the 8.1-1 iso. Ran apt update only to get the package list or lldp won't install (maybe that was a mistake)?

I have tried with no cephfs for ISOs-Templates and used a NFS share instead.

This worked the longest but shortly after nodes froze...

I'm off to bed but tomorrow I'll try the 8.0-2 iso, then maybe Kernal update on-top if it is stable.

But something has completely broken it for us NUC12 users...

To me though the it has to be some sort of driver issue maybe for the CPU as at least in my case when the node "disconnects" In the webui, the machine has actually locked up/frozen (I can see this through my KVM) and has to be hard reset.

jacoburgin commented Feb 7, 2024 •

edited

Loading

Let me know how it goes! It it works for you maybe I'll consider wiping a 3rd time...

Some success, updating the microcode has made migrating a windows VM possible. No crashes there. But still uploading an iso to a cephfs. That locked two nodes and had to be hard reset.

Others are experiencing similar after AN update. Just not sure what broke it all

https://www.reddit.com/r/Proxmox/s/pDMvr9WKA8

jacoburgin commented Feb 9, 2024

SO I have reinstalled the 3 nuc12's to 7.5, zero issues as expected with Scyto's gist. Upgraded to PVE8 and kernel 6.5 and everything is broken.

Downgraded the kernel to 6.2.16.20 (which includes Scyto's TB fix) and have had zero issues so far! I can live migrate a again and upload ISO's. No other "fixes" applied just a change in kernel

@lettucebuns

zombiehoffa commented Feb 9, 2024

Thevlater kernels reverted the fix???

jacoburgin commented Feb 9, 2024

Thevlater kernels reverted the fix???

No, Scyto's thunderbolt fix is applied from 6.2.16-14 onwards.

DarkPhyber-hg commented Feb 23, 2024

i just got my ms-01's, i followed the guide and i've re-installed 3 or4 times now. When using 10gbe for my ceph network, everything works fine. When using thunderbolt i keep getting random lock ups on any node when ever the ceph storage pool is under load. I am on kernel 6.5.13-1-pve and pve 8.1.4.

I wonder if something broke in the later kernel?

DarkPhyber-hg commented Feb 23, 2024 •

edited

Loading

Following up on what i've done so far. I've reinstalled proxmox quite a few times. I couldn't go 3 minutes into restoring a VM from PBS without at least 1 node locking up hard.

In an attempt to isolate the issue, i only used 2 nodes, i was still having the exact same issue. I'm using 2/2 replication and in corosync.conf i gave one node 2 votes.

I decided to eliminate open fabric, so i am just using standard IP'ing assigned to en05 with 2 hosts. I also used reef instead of quincy, so I changed 2 variables. It's been working perfectly for like 8 hours so i think this is a success.

My next test that i'm gonna start working on now, will be to add openfabric to the working configuration. If this doesn't work then there's some kind of issue with TB, openfabric, ceph, PVE 8.1.4, and kernel 6.5, and if it does work then the issue is likely with quincy and the combination of variables on kernel 6.5

DarkPhyber-hg commented Feb 23, 2024

ok, going to reef did the trick, no more lockups even with openfabric

jacoburgin commented Feb 23, 2024

I had to lock the kernel to 6.2 to get stability on my nuc 12's

DarkPhyber-hg commented Feb 25, 2024

I forgot that i had commented out the MTU of 65520, so it was defaulting to 1500, when i put it back to 65520 i got an instant lock up! I'm playing around with various mtu sizes right now. What's strange is that with an extended iperf3 test i got no lockups with the higher mtu value.

DarkPhyber-hg commented Feb 25, 2024 •

edited

Loading

ok, i've been playing around with various mtu sizes, there's no perceivable difference on my hardware in iperf3 speeds for an mtu betweeen 1500 and 34,000. I always wind up with an iperf3 test of around 22-23gbps. Going to 35,000 i get lockups with ceph.

Using the ceph benchmark tool rados, on a write test, is a good way to stress test and see if i will get a lockup without having to use real world load. Additionally, i consistently get the best write throughput and iop performance with an mtu of 1500 with my current hardware. I am using consumer wd sn850x m.2 drives, until i get some enterprise ones, so this could have an impact on this as well.

I have some Samsung PM9A3 u.2 drives on the way, along with some PM983 m.2 drives. Once i get those i'll do another round of testing and hopefully put this stack into production to replace my r730.

djs42012 commented Nov 20, 2024

Thank you for the gist! I'ts worked fantastically for me so far.

Pardon me if this is thick, but I have two questions before I proceed with this step (setting up ceph and HA).

As some others have reported, I cannot ping the mesh network's IPv4 addresses unless I systemctl restart frr on each node after startup. That said, am I correctly interpreting your earlier guidance

i strongly recommend you consider only using IPv6... either use IPv4 or IPv6 addressees for all the monitors

if I just replace the IPv4 addresses with their IPv6 counterparts in the instructions in this part of the gist?

Additionally, I would very much like to add a fourth node to the cluster to serve as a dedicated router/reverse proxy/networking tools stack.

Is this possible/ does this pose any issues if so? I have never worked with HA and am getting a little stuck what to make of the settings we put in for the migration network, and what bearing they would have on a potential fourth node.

Currently I am using migration: insecure,network=fc00::81/125 in my datacenter.cfg and everything is working as expected.

I did see one previous poster mention a fourth node but could not gather whether any special configuration changes are required to add one to this setup.

Thank you!

Author

scyto commented Nov 20, 2024 •

edited

Loading

@djs42012 there is defintely weird issues on many machines with timing that stops IPv4 fully coming up in some scenarios and sometimes stops the thunderbolt. Folks have found a variety of workarounds (documented in the comment history). I only put fixes in my main gist that a)i implemented my self b)that i think can work for all scenarios. As i dont have any of those issues i can't do a repro and figure out root cause to get bugs filed with the proxmox team. it may be as simple as changing some service order and start ups, it may be as complex as needing a fix in the upstream kernel - we just don't know.

There really is no need to run IPv4 on the mesh network, it can be all configured for IPv6 and seems to work more reliably (i think the IPv4 issue is related to the kernel routing module and timing at startup). As such i have contemplated removing all the IPv4 stuff from the gist, i only did both for my own playing. All my ceph is configured with IPv6 only like this.

Ceph Config

[global]
	auth_client_required = cephx
	auth_cluster_required = cephx
	auth_service_required = cephx
	cluster_network = fc00::/64
	fsid = 5e55fd50-d135-413d-bffe-9d0fae0ef5fa
	mon_allow_pool_delete = true
	mon_host = fc00::83 fc00::82 fc00::81
	ms_bind_ipv4 = false
	ms_bind_ipv6 = true
	osd_pool_default_min_size = 2
	osd_pool_default_size = 3
	public_network = fc00::/64

[client]
	keyring = /etc/pve/priv/$cluster.$name.keyring

[client.crash]
	keyring = /etc/pve/ceph/$cluster.$name.keyring

[mds]
	keyring = /var/lib/ceph/mds/ceph-$id/keyring

[mds.pve1]
	host = pve1
	mds_standby_for_name = pve

[mds.pve1-1]
	host = pve1
	mds_standby_for_name = pve

[mds.pve2]
	host = pve2
	mds_standby_for_name = pve

[mds.pve2-1]
	host = pve2
	mds_standby_for_name = pve

[mds.pve3]
	host = pve3
	mds_standby_for_name = pve

[mds.pve3-1]
	host = pve3
	mds_standby_for_name = pve

[mon.pve1-IPv6]
	public_addr = fc00::81

[mon.pve2-IPv6]
	public_addr = fc00::82

[mon.pve3-IPv6]
	public_addr = fc00::83

and PVE cluster config

root@pve1:/etc/pve# cat datacenter.cfg 

crs: ha-rebalance-on-start=1
email_from: [email protected]
keyboard: en-us
# migration: insecure,network=10.0.0.80/29
migration: insecure,network=fc00::81/64
notify: target-fencing=send-alerts-to-alex,target-package-updates=send-alerts-to-alex,target-replication=send-alerts-to-alex

hope that helps you make your decisions on what approach you want

djs42012 commented Nov 20, 2024

Thank you @scyto , that does help. As for adding a fourth node to the cluster, do you foresee any issues there?

Author

scyto commented Nov 20, 2024

@djs42012 not at all, just remember that for cross node traffic it now might be 2 hop process, so traffic may have to pass through one node to get to another, i don't know what means for performance. But the routing will work, its no different on three node setup to pulling one of the cables - in that scenario the two nodes at the end of the chain have to pass traffic through the one in the middle.

djs42012 commented Nov 20, 2024

@scyto Great, thank you! I wasn't sure if the migration: insecure,network=fc00::81/64 entry would somehow lock out the fourth node since, to my understanding, that references the thunderbolt network to which it has no access.

Author

scyto commented Nov 21, 2024

@djs42012 it it uses that to identify the subnet of the migration network not any specific node, it ignores the host portion of the address, but that's the way you have to specify it - i have no idea why, it confused me too, a lot.

djs42012 commented Nov 21, 2024

Interesting. Well I will report back once I have a working setup (hopefully). Thanks again!

mrkhachaturov commented Dec 27, 2024 •

edited

Loading

I am looking for guidance on configuring Thunderbolt networking to access a Ceph cluster from a virtual machine. My goal is to utilize the Ceph Container Storage Interface (CSI) in a Kubernetes environment running on my Proxmox cluster.

In the Thunderbolt networking configuration, we are defining IP addresses on the loopback interfaces for en05 and en06. Should I create a bridge on these interfaces and attach it to the virtual machine?

Any recommendations would be greatly appreciated.

yet-an-other commented Dec 27, 2024

There really is no need to run IPv4 on the mesh network, it can be all configured for IPv6 and seems to work more reliably (i think the IPv4 issue is related to the kernel routing module and timing at startup).

Unfortunately, VXLAN only works with IPv4 (ifupdown2 does not support IPv6 https://forum.proxmox.com/threads/sdn-vxlan-over-ipv6.114803/ - two years old issue), and EVPN conflicts with OpenFabric. Therefore, if you want to have a local virtual network for VMs across the cluster that works on top of the TB network between nodes, IPv4 is the only option.

aixsyd commented Dec 31, 2024

@djs42012 there is defintely weird issues on many machines with timing that stops IPv4 fully coming up in some scenarios and sometimes stops the thunderbolt. Folks have found a variety of workarounds (documented in the comment history). I only put fixes in my main gist that a)i implemented my self b)that i think can work for all scenarios. As i dont have any of those issues i can't do a repro and figure out root cause to get bugs filed with the proxmox team. it may be as simple as changing some service order and start ups, it may be as complex as needing a fix in the upstream kernel - we just don't know.

There really is no need to run IPv4 on the mesh network, it can be all configured for IPv6 and seems to work more reliably (i think the IPv4 issue is related to the kernel routing module and timing at startup). As such i have contemplated removing all the IPv4 stuff from the gist, i only did both for my own playing. All my ceph is configured with IPv6 only like this.

Ceph Config
[global]
	auth_client_required = cephx
	auth_cluster_required = cephx
	auth_service_required = cephx
	cluster_network = fc00::/64
	fsid = 5e55fd50-d135-413d-bffe-9d0fae0ef5fa
	mon_allow_pool_delete = true
	mon_host = fc00::83 fc00::82 fc00::81
	ms_bind_ipv4 = false
	ms_bind_ipv6 = true
	osd_pool_default_min_size = 2
	osd_pool_default_size = 3
	public_network = fc00::/64

[client]
	keyring = /etc/pve/priv/$cluster.$name.keyring

[client.crash]
	keyring = /etc/pve/ceph/$cluster.$name.keyring

[mds]
	keyring = /var/lib/ceph/mds/ceph-$id/keyring

[mds.pve1]
	host = pve1
	mds_standby_for_name = pve

[mds.pve1-1]
	host = pve1
	mds_standby_for_name = pve

[mds.pve2]
	host = pve2
	mds_standby_for_name = pve

[mds.pve2-1]
	host = pve2
	mds_standby_for_name = pve

[mds.pve3]
	host = pve3
	mds_standby_for_name = pve

[mds.pve3-1]
	host = pve3
	mds_standby_for_name = pve

[mon.pve1-IPv6]
	public_addr = fc00::81

[mon.pve2-IPv6]
	public_addr = fc00::82

[mon.pve3-IPv6]
	public_addr = fc00::83
and PVE cluster config
root@pve1:/etc/pve# cat datacenter.cfg 

crs: ha-rebalance-on-start=1
email_from: [email protected]
keyboard: en-us
# migration: insecure,network=10.0.0.80/29
migration: insecure,network=fc00::81/64
notify: target-fencing=send-alerts-to-alex,target-package-updates=send-alerts-to-alex,target-replication=send-alerts-to-alex
hope that helps you make your decisions on what approach you want

I'm not sure how you got this config to work - every time I try to use an IPv6 network, when adding an OSD from another node, I get the following:

No address from ceph cluster network (fc00:0000:0000:0000:0000:0000:0000:0081/128) found on node 'ms-02'. Check your network config. (500)

flx-666 commented Jan 2, 2025

I am looking for guidance on configuring Thunderbolt networking to access a Ceph cluster from a virtual machine. My goal is to utilize the Ceph Container Storage Interface (CSI) in a Kubernetes environment running on my Proxmox cluster.

In the Thunderbolt networking configuration, we are defining IP addresses on the loopback interfaces for en05 and en06. Should I create a bridge on these interfaces and attach it to the virtual machine?

Any recommendations would be greatly appreciated.

I am in the same situation, I'd like to allow some of my VMs to access the cephfs for persistent storage in a K8s cluster.
I found some threads about doing this, but they all seem to use https://pve.proxmox.com/wiki/Full_Mesh_Network_for_Ceph_Server#RSTP_Loop_Setup
I'd prefer not to migrate from frr to RTSP if possible, so is there someone with away of doing that with frr?
Or a way to migrate from frr to RTSP without breaking the cluster, ceph ...?
Any hint would be appreciated :)

IndianaJoe1216 commented Jan 2, 2025

I am looking for guidance on configuring Thunderbolt networking to access a Ceph cluster from a virtual machine. My goal is to utilize the Ceph Container Storage Interface (CSI) in a Kubernetes environment running on my Proxmox cluster.
In the Thunderbolt networking configuration, we are defining IP addresses on the loopback interfaces for en05 and en06. Should I create a bridge on these interfaces and attach it to the virtual machine?
Any recommendations would be greatly appreciated.

I am in the same situation, I'd like to allow some of my VMs to access the cephfs for persistent storage in a K8s cluster. I found some threads about doing this, but they all seem to use https://pve.proxmox.com/wiki/Full_Mesh_Network_for_Ceph_Server#RSTP_Loop_Setup I'd prefer not to migrate from frr to RTSP if possible, so is there someone with away of doing that with frr? Or a way to migrate from frr to RTSP without breaking the cluster, ceph ...? Any hint would be appreciated :)

I am in the exact same boat and would like some assistance here as well!

yet-an-other commented Jan 2, 2025

Yeah, same here.
I have the same problem. I tried multiple approaches, and one actually works, but I'm not sure it's the best solution. But it works. :)

Disclaimer: Network engineering is not my area of expertise, so everything I'm writing here might be completely off-base.
Note: My setup is a bit different - I have two nodes and one QDevice. This shouldn't affect the networking, but it's worth mentioning, just in case.

At first, my uneducated guess was that if we're using FRR to set up a mesh between nodes, then we need to use FRR on the VM to include it in the mesh. However, as mentioned, I'm not a network engineer and have zero knowledge of how FRR actually works, so I gave up as soon as my first straightforward and naive attempt failed.

The second approach was to use VXLAN. VXLAN allows you to create a virtual network within a cluster, and it uses a mesh network to establish communication between VMs on different nodes. The nodes themselves can also be part of the network. It's quite easy to install, and from the VM or node perspective, it looks like just another bridge or NIC. The network itself works perfectly (though the throughput dropped to 10Gb on average from 26Gb, and lots of retries appeared, but I didn't dig deeper). Unfortunately, I was unable to use this network as a public network for Ceph and enforce monitors to listen on IPs from it. As soon as I changed the config, the cluster went down.

The final attempt, which actually works but looks ugly, uses Samba or NFS. The trick is simple: on every node, you create a virtual bridge that isn't attached to any interface, with the same static IP. No gateway. Afterward, add this bridge to every VM that needs access to shared disk, and give it a static IP from the same network. This creates a virtual network within one node. Then just create an SMB/NFS server on the node and mount the share on VMs. Since the bridge configuration is similar on every node, nothing will change for the VM during migration, and it should continue to work.
Performance is fine for my scenarios - I have stable 1GB write and 2-2.5GB read speeds.
There are a few issues with this approach though. There's a slight delay with disk access during migration as the connection to the share is interrupted, which may cause issues if the VM migrates during active I/O. Also, some applications that are sensitive to disk type might not work with shared disk, such as PostgreSQL.

Bottom line - it works for me for now, but I'd be happy if someone could help me fix this in a more proper way.

IndianaJoe1216 commented Jan 3, 2025

Yeah, same here. I have the same problem. I tried multiple approaches, and one actually works, but I'm not sure it's the best solution. But it works. :)

Disclaimer: Network engineering is not my area of expertise, so everything I'm writing here might be completely off-base. Note: My setup is a bit different - I have two nodes and one QDevice. This shouldn't affect the networking, but it's worth mentioning, just in case.

At first, my uneducated guess was that if we're using FRR to set up a mesh between nodes, then we need to use FRR on the VM to include it in the mesh. However, as mentioned, I'm not a network engineer and have zero knowledge of how FRR actually works, so I gave up as soon as my first straightforward and naive attempt failed.

The second approach was to use VXLAN. VXLAN allows you to create a virtual network within a cluster, and it uses a mesh network to establish communication between VMs on different nodes. The nodes themselves can also be part of the network. It's quite easy to install, and from the VM or node perspective, it looks like just another bridge or NIC. The network itself works perfectly (though the throughput dropped to 10Gb on average from 26Gb, and lots of retries appeared, but I didn't dig deeper). Unfortunately, I was unable to use this network as a public network for Ceph and enforce monitors to listen on IPs from it. As soon as I changed the config, the cluster went down.

The final attempt, which actually works but looks ugly, uses Samba or NFS. The trick is simple: on every node, you create a virtual bridge that isn't attached to any interface, with the same static IP. No gateway. Afterward, add this bridge to every VM that needs access to shared disk, and give it a static IP from the same network. This creates a virtual network within one node. Then just create an SMB/NFS server on the node and mount the share on VMs. Since the bridge configuration is similar on every node, nothing will change for the VM during migration, and it should continue to work. Performance is fine for my scenarios - I have stable 1GB write and 2-2.5GB read speeds. There are a few issues with this approach though. There's a slight delay with disk access during migration as the connection to the share is interrupted, which may cause issues if the VM migrates during active I/O. Also, some applications that are sensitive to disk type might not work with shared disk, such as PostgreSQL.

Bottom line - it works for me for now, but I'd be happy if someone could help me fix this in a more proper way.

Thanks for following up! Unfortunately I don't think this will work for me. How are you mounting the Ceph pool as an NFS share? Are you able to do that if Samba/NFS server is configured on the PVE Nodes?

mrkhachaturov commented Jan 3, 2025 •

edited

Loading

:

Ceph allows the addition of multiple public networks.

In my Proxmox cluster, I have six Minisforum MS-01 machines. Each MS-01 is equipped with two SFP+ NICs and two 2.5 Gb NICs.

The SFP+ adapters are configured in an 802.3ad bond with VLAN support.

I have set up VLAN 10 for virtual machines, using the subnet 10.10.0.0/24.

Additionally, I have included this as an extra public network for Ceph.

root@pve01:~# ceph mon stat
e8: 6 mons at {pve01=[v2:10.0.0.81:3300/0,v1:10.0.0.81:6789/0,v2:10.10.0.146:3300/0,v1:10.10.0.146:6789/0],pve02=[v2:10.0.0.82:3300/0,v1:10.0.0.82:6789/0,v2:10.10.0.147:3300/0,v1:10.10.0.147:6789/0],pve03=[v2:10.0.0.83:3300/0,v1:10.0.0.83:6789/0,v2:10.10.0.150:3300/0,v1:10.10.0.150:6789/0],pve04=[v2:10.0.0.84:3300/0,v1:10.0.0.84:6789/0,v2:10.10.0.153:3300/0,v1:10.10.0.153:6789/0],pve05=[v2:10.0.0.85:3300/0,v1:10.0.0.85:6789/0,v2:10.10.0.154:3300/0,v1:10.10.0.154:6789/0],pve06=[v2:10.0.0.86:3300/0,v1:10.0.0.86:6789/0,v2:10.10.0.155:3300/0,v1:10.10.0.155:6789/0]} removed_ranks: {} disallowed_leaders: {}, election epoch 156752, leader 0 pve02, quorum 0,1,2,3,4,5 pve02,pve03,pve04,pve01,pve05,pve06

I plan to test this configuration with Ceph CSI. If everything works as expected, I will share a detailed guide on how to configure it.

IndianaJoe1216 commented Jan 3, 2025 •

edited

Loading

:

Ceph allows the addition of multiple public networks.

In my Proxmox cluster, I have six Minisforum MS-01 machines. Each MS-01 is equipped with two SFP+ NICs and two 2.5 Gb NICs.

The SFP+ adapters are configured in an 802.3ad bond with VLAN support.

I have set up VLAN 10 for virtual machines, using the subnet 10.10.0.0/24.

Additionally, I have included this as an extra public network for Ceph.
root@pve01:~# ceph mon stat
e8: 6 mons at {pve01=[v2:10.0.0.81:3300/0,v1:10.0.0.81:6789/0,v2:10.10.0.146:3300/0,v1:10.10.0.146:6789/0],pve02=[v2:10.0.0.82:3300/0,v1:10.0.0.82:6789/0,v2:10.10.0.147:3300/0,v1:10.10.0.147:6789/0],pve03=[v2:10.0.0.83:3300/0,v1:10.0.0.83:6789/0,v2:10.10.0.150:3300/0,v1:10.10.0.150:6789/0],pve04=[v2:10.0.0.84:3300/0,v1:10.0.0.84:6789/0,v2:10.10.0.153:3300/0,v1:10.10.0.153:6789/0],pve05=[v2:10.0.0.85:3300/0,v1:10.0.0.85:6789/0,v2:10.10.0.154:3300/0,v1:10.10.0.154:6789/0],pve06=[v2:10.0.0.86:3300/0,v1:10.0.0.86:6789/0,v2:10.10.0.155:3300/0,v1:10.10.0.155:6789/0]} removed_ranks: {} disallowed_leaders: {}, election epoch 156752, leader 0 pve02, quorum 0,1,2,3,4,5 pve02,pve03,pve04,pve01,pve05,pve06
I plan to test this configuration with Ceph CSI. If everything works as expected, I will share a detailed guide on how to configure it.

Please do! I only have 3 MS-01's but this should work perfectly for me.

yet-an-other commented Jan 3, 2025 •

edited

Loading

Thanks for following up! Unfortunately I don't think this will work for me. How are you mounting the Ceph pool as an NFS share? Are you able to do that if Samba/NFS server is configured on the PVE Nodes?

Correct, but you have to mount not a ceph pool but Ceph FS. You have to create ceph fs volume (node->Ceph->CephFS), and then mount it from /mnt/pve/cephfs

mrkhachaturov commented Jan 5, 2025

:

Ceph allows the addition of multiple public networks.
In my Proxmox cluster, I have six Minisforum MS-01 machines. Each MS-01 is equipped with two SFP+ NICs and two 2.5 Gb NICs.
The SFP+ adapters are configured in an 802.3ad bond with VLAN support.
I have set up VLAN 10 for virtual machines, using the subnet 10.10.0.0/24.
Additionally, I have included this as an extra public network for Ceph.
root@pve01:~# ceph mon stat
e8: 6 mons at {pve01=[v2:10.0.0.81:3300/0,v1:10.0.0.81:6789/0,v2:10.10.0.146:3300/0,v1:10.10.0.146:6789/0],pve02=[v2:10.0.0.82:3300/0,v1:10.0.0.82:6789/0,v2:10.10.0.147:3300/0,v1:10.10.0.147:6789/0],pve03=[v2:10.0.0.83:3300/0,v1:10.0.0.83:6789/0,v2:10.10.0.150:3300/0,v1:10.10.0.150:6789/0],pve04=[v2:10.0.0.84:3300/0,v1:10.0.0.84:6789/0,v2:10.10.0.153:3300/0,v1:10.10.0.153:6789/0],pve05=[v2:10.0.0.85:3300/0,v1:10.0.0.85:6789/0,v2:10.10.0.154:3300/0,v1:10.10.0.154:6789/0],pve06=[v2:10.0.0.86:3300/0,v1:10.0.0.86:6789/0,v2:10.10.0.155:3300/0,v1:10.10.0.155:6789/0]} removed_ranks: {} disallowed_leaders: {}, election epoch 156752, leader 0 pve02, quorum 0,1,2,3,4,5 pve02,pve03,pve04,pve01,pve05,pve06
I plan to test this configuration with Ceph CSI. If everything works as expected, I will share a detailed guide on how to configure it.
Please do! I only have 3 MS-01's but this should work perfectly for me.

I have set up three public networks for Ceph: Thunderbolt Mesh, VLAN 60, and VLAN 80. I recently installed the Ceph Dashboard to verify the connectivity of the monitors across these networks, and I'm pleased to report that everything is functioning smoothly.

root@pve01:/etc/ceph# ceph mon stat
e18: 6 mons at {pve01=[v2:10.0.0.81:3300/0,v1:10.0.0.81:6789/0,v2:10.1.60.1:3300/0,v1:10.1.60.1:6789/0,v2:10.1.80.1:3300/0,v1:10.1.80.1:6789/0],pve02=[v2:10.0.0.82:3300/0,v1:10.0.0.82:6789/0,v2:10.1.60.2:3300/0,v1:10.1.60.2:6789/0,v2:10.1.80.2:3300/0,v1:10.1.80.2:6789/0],pve03=[v2:10.0.0.83:3300/0,v1:10.0.0.83:6789/0,v2:10.1.60.3:3300/0,v1:10.1.60.3:6789/0,v2:10.1.80.3:3300/0,v1:10.1.80.3:6789/0],pve04=[v2:10.0.0.84:3300/0,v1:10.0.0.84:6789/0,v2:10.1.60.4:3300/0,v1:10.1.60.4:6789/0,v2:10.1.80.4:3300/0,v1:10.1.80.4:6789/0],pve05=[v2:10.0.0.85:3300/0,v1:10.0.0.85:6789/0,v2:10.1.60.5:3300/0,v1:10.1.60.5:6789/0,v2:10.1.80.5:3300/0,v1:10.1.80.5:6789/0],pve06=[v2:10.0.0.86:3300/0,v1:10.0.0.86:6789/0,v2:10.1.60.6:3300/0,v1:10.1.60.6:6789/0,v2:10.1.80.6:3300/0,v1:10.1.80.6:6789/0]} removed_ranks: {} disallowed_leaders: {}, election epoch 126, leader 0 pve04, quorum 0,1,2,3,4,5 pve04,pve01,pve02,pve03,pve05,pve06

scyto/proxmox-ceph.md

CEPH HA Setup

Ceph Initial Install & monitor creation

Add Addtional managers

Add OSDs

Create Pool

Configure HA

jacoburgin commented Jan 30, 2024 • edited Loading

jacoburgin commented Jan 31, 2024

jacoburgin commented Feb 7, 2024 • edited Loading

jacoburgin commented Feb 9, 2024

zombiehoffa commented Feb 9, 2024

jacoburgin commented Feb 9, 2024

DarkPhyber-hg commented Feb 23, 2024

DarkPhyber-hg commented Feb 23, 2024 • edited Loading

DarkPhyber-hg commented Feb 23, 2024

jacoburgin commented Feb 23, 2024

DarkPhyber-hg commented Feb 25, 2024

DarkPhyber-hg commented Feb 25, 2024 • edited Loading

djs42012 commented Nov 20, 2024

scyto commented Nov 20, 2024 • edited Loading

djs42012 commented Nov 20, 2024

scyto commented Nov 20, 2024

djs42012 commented Nov 20, 2024

scyto commented Nov 21, 2024

djs42012 commented Nov 21, 2024

mrkhachaturov commented Dec 27, 2024 • edited Loading

yet-an-other commented Dec 27, 2024

aixsyd commented Dec 31, 2024

flx-666 commented Jan 2, 2025

IndianaJoe1216 commented Jan 2, 2025

yet-an-other commented Jan 2, 2025

IndianaJoe1216 commented Jan 3, 2025

mrkhachaturov commented Jan 3, 2025 • edited Loading

IndianaJoe1216 commented Jan 3, 2025 • edited Loading

yet-an-other commented Jan 3, 2025 • edited Loading

mrkhachaturov commented Jan 5, 2025

jacoburgin commented Jan 30, 2024 •

edited

Loading

jacoburgin commented Feb 7, 2024 •

edited

Loading

DarkPhyber-hg commented Feb 23, 2024 •

edited

Loading

DarkPhyber-hg commented Feb 25, 2024 •

edited

Loading

scyto commented Nov 20, 2024 •

edited

Loading

mrkhachaturov commented Dec 27, 2024 •

edited

Loading

mrkhachaturov commented Jan 3, 2025 •

edited

Loading

IndianaJoe1216 commented Jan 3, 2025 •

edited

Loading

yet-an-other commented Jan 3, 2025 •

edited

Loading