OSD Journal configuration, upstream:
- http://docs.ceph.com/docs/master/rados/configuration/osd-config-ref/
- http://docs.ceph.com/docs/master/rados/configuration/journal-ref/
OSD Journal configuration, RHCS 2 (downstream):
Red Hat Customer Portal:
- http://www.sebastien-han.fr/blog/2014/02/17/ceph-io-patterns-the-bad/
- https://www.hastexo.com/resources/hints-and-kinks/solid-state-drives-and-ceph-osd-journals
Based on resources linked above.
Ceph OSD Journals provides (why OSD journal exists and what it does?):
- speed (small random I/O can be quickly written in sequence into journal)
- consistency (full description of operation is written into journal - both data and metadata - first)
Facts:
- Do not host multiple journals into single HDD (the whole point of OSD journal is to provide a place where OSD can write random I/O requests in sequential way as they arrive, without need to do seek/random access - with 2 or more journals on a single HDD, this property is lost).
- Multiple journals can be hosted on SSD/NVMe only. The number of OSD journals which could be hosted on a signle SSD/NVMe depends on it's seq. write limits.
- When the storage machine doesn't have (enough or at all) SSD for journals, most common use case is to host both journal and OSD disk on a single HDD - so called collocated journal).
- Optimal strategy is to have HDD for OSD data, and SSD with journals. One needs to pay attentions to limits though.
Simple trick to analyse the write patterns applied to your Ceph journal.
Assuming your journal device is /dev/sdb1
, checking for 10 seconds:
$ iostat -dmx /dev/sbd1 10 | awk '/[0-9]/ {print $8}'
16.25
Now converting sectors to KiB.
16.25 * 512 / 1024 = 8
And yes, I was sending 8K requests :)
see: https://ceph.com/geen-categorie/ceph-analyse-journal-write-pattern/
# ceph-disk list
WARNING:ceph-disk:Old blkid does not support ID_PART_ENTRY_* fields, trying sgdisk; may not correctly identify ceph volumes with dmcrypt
/dev/sda :
/dev/sda1 other, xfs, mounted on /boot
/dev/sda2 other, LVM2_member
/dev/sdb :
/dev/sdb1 ceph data, active, unknown cluster 6f7cebf2-ceef-49b1-8928-2d36e6044db4, osd.19, journal /dev/sde1
/dev/sdc :
/dev/sdc1 ceph data, active, unknown cluster 6f7cebf2-ceef-49b1-8928-2d36e6044db4, osd.20, journal /dev/sde2
/dev/sdd :
/dev/sdd1 ceph data, active, unknown cluster 6f7cebf2-ceef-49b1-8928-2d36e6044db4, osd.21, journal /dev/sde3
/dev/sde :
/dev/sde1 ceph journal, for /dev/sdb1
/dev/sde2 ceph journal, for /dev/sdc1
/dev/sde3 ceph journal, for /dev/sdd1
See:
Another somewhat common use case is to host journals and osds on the same flash or NVMe device so that not only writes are fast but also reads.