This only exists to help me learn about the inner workings of Rook/Ceph and to test design ideas for Rook and ceph-docker as I spin the wheels.
You should use rook-operator to try Rook on Kubernetes. Not this.
Ideas I've implemented or want to implement.
- Monitors bootstrap via Kubernetes-managed DNS SRV records
- Caveat: Rook doesn't support this yet, so the examples don't show it. This was tested with straight ceph-docker though.
- rook-tools Ceph clients use DNS SRV records
-
StatefulSet
and Podsubdomain
implemented for [extra DNS options][ss-subdomains] (just because) - Smart readiness/health probes on monitor containers for accurate/quick DNS/SRV record updates
- Add comments in manifests to explain why I do things
- Open issue with Kubernetes to add
{namespace}.svc.{cluster-domain}
to resolv.conf so we don't have to script it - Open issue with Ceph to add a config option to specify the (sub)domain for SRV records instead of relying on search domains
- Centralized configs for rook and ceph daemons (ConfigMaps, Secrets) possibly using konfd
- Automated Monitor and OSD recreation or replacement when pods/nodes delete/fail.
These resources probably wont work on your Kubernetes cluster as-is. Mainly because they assume a cluster domain of k8s.zbrbdl
. You should replace all instances of that string with cluster.local
or whatever your cluster domain is.
Put all the YAMLs in a new directory like rook-k8s
used below.
Run:
kubectl create -f rook-k8s
Output:
configmap "rook-ceph-config" created
service "rook" created
statefulset "mon" created
pod "rook-tools" created
secret "rook-client-keys" created
This is what you can expect to see after creating the resources.
Run:
kubectl get all,secrets,configmap --selector app=rook
Output:
NAME READY STATUS RESTARTS AGE IP NODE LABELS
po/mon-0 1/1 Running 0 34m 10.2.6.5 node3.zbrbdl app=rook,is_endpoint=true,mon_cluster=rookcluster,role=mon
po/mon-1 1/1 Running 0 34m 10.2.136.189 node2.zbrbdl app=rook,is_endpoint=true,mon_cluster=rookcluster,role=mon
po/mon-2 1/1 Running 0 34m 10.2.247.29 node1.zbrbdl app=rook,is_endpoint=true,mon_cluster=rookcluster,role=mon
po/rook-tools 1/1 Running 0 34m 10.2.136.188 node2.zbrbdl app=rook
NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR LABELS
svc/rook None <none> 6790/TCP 34m app=rook,is_endpoint=true app=rook
NAME DESIRED CURRENT AGE LABELS
statefulsets/mon 3 3 34m mon quay.io/rook/rookd:dev-2017-02-27-k8s app=rook,is_endpoint=true,mon_cluster=rookcluster,role=mon app=rook,role=mon,rook_cluster=rookcluster
NAME TYPE DATA AGE LABELS
secrets/rook-client-keys Opaque 2 51m app=rook
NAME DATA AGE LABELS
cm/rook-ceph-config 1 34m app=rook
Run:
kubectl exec rook-tools -- ceph -s
Output:
cluster d0b91bda-032e-451e-863d-812cb3fefee9
health HEALTH_ERR
2048 pgs are stuck inactive for more than 300 seconds
2048 pgs stuck inactive
2048 pgs stuck unclean
no osds
monmap e2: 1 mons at {mon-2=10.2.247.29:6790/0}
election epoch 4, quorum 0 mon-2
mgr no daemons active
osdmap e1: 0 osds: 0 up, 0 in
flags sortbitwise,require_jewel_osds,require_kraken_osds
pgmap v2: 2048 pgs, 1 pools, 0 bytes data, 0 objects
0 kB used, 0 kB / 0 kB avail
2048 creating
You may notice it only sees 1/1 monitor in the quorum when we expect to see 3/3.
ceph -s
above works and round-robin queries one of three monitors. However, each
monitor thinks it's in a cluster of one by itslef. rookd will need to be patched to
allow running daemons without providing an initial monitor IP(s), so it falls back on
DNS SRV discovery.
Run:
kubectl exec rook-tools -- cat /etc/rook/ceph.conf
Output:
[global]
#mon initial members = mon-0, mon-1, mon-2
osd crush chooseleaf type = 1
osd pool default size = 2
Run:
kubectl exec mon-1 -- cat /var/lib/rook/mon-1/rookcluster.config
Output:
[global]
enable experimental unrecoverable data corrupting features =
fsid = d0b91bda-032e-451e-863d-812cb3fefee9
run dir = /var/lib/rook/mon-1
mon initial members = mon-1
log file = /dev/stdout
mon cluster log file = /dev/stdout
mon keyvaluedb = rocksdb
debug default = 0
debug rados = 0
debug mon = 0
debug osd = 0
debug bluestore = 0
debug filestore = 0
debug journal = 0
debug leveldb = 0
filestore_omap_backend = rocksdb
osd pg bits = 11
osd pgp bits = 11
osd pool default size = 2
osd pool default min size = 1
osd pool default pg num = 100
osd pool default pgp num = 100
osd objectstore = filestore
rbd_default_features = 3
crushtool =
fatal signal handlers = false
osd crush chooseleaf type = 1
[client.admin]
keyring = /var/lib/rook/mon-1/keyring
[mon.mon-1]
name = mon-1
mon addr = 10.2.136.189:6790
[mon.mon-1]
section added by rookd. This could be ommitted in this
DNS/SRV based configuration.
To remove the resources from Kubernetes run:
kubectl delete -f rook-k8s
Output:
configmap "rook-ceph-config" deleted
secret "rook-client-keys" deleted
service "rook" deleted
statefulset "mon" deleted
pod "rook-tools" deleted