Skip to content

Instantly share code, notes, and snippets.

@bzub
Last active March 4, 2017 14:53
Show Gist options
  • Save bzub/b5ece2b19cac7cf7301b1d74e798c04f to your computer and use it in GitHub Desktop.
Save bzub/b5ece2b19cac7cf7301b1d74e798c04f to your computer and use it in GitHub Desktop.
Rook/Ceph On Kubernetes (Experiment)

Rook On Kubernetes (The Hard Way)

Why?

This only exists to help me learn about the inner workings of Rook/Ceph and to test design ideas for Rook and ceph-docker as I spin the wheels.

You should use rook-operator to try Rook on Kubernetes. Not this.

Status

Ideas I've implemented or want to implement.

  • Monitors bootstrap via Kubernetes-managed DNS SRV records
    • Caveat: Rook doesn't support this yet, so the examples don't show it. This was tested with straight ceph-docker though.
  • rook-tools Ceph clients use DNS SRV records
  • StatefulSet and Pod subdomain implemented for [extra DNS options][ss-subdomains] (just because)
  • Smart readiness/health probes on monitor containers for accurate/quick DNS/SRV record updates
  • Add comments in manifests to explain why I do things
  • Open issue with Kubernetes to add {namespace}.svc.{cluster-domain} to resolv.conf so we don't have to script it
  • Open issue with Ceph to add a config option to specify the (sub)domain for SRV records instead of relying on search domains
  • Centralized configs for rook and ceph daemons (ConfigMaps, Secrets) possibly using konfd
  • Automated Monitor and OSD recreation or replacement when pods/nodes delete/fail.

Install

NOTE: Cluster specific settings

These resources probably wont work on your Kubernetes cluster as-is. Mainly because they assume a cluster domain of k8s.zbrbdl. You should replace all instances of that string with cluster.local or whatever your cluster domain is.

Create the kubernetes resources

Put all the YAMLs in a new directory like rook-k8s used below. Run:

kubectl create -f rook-k8s

Output:

configmap "rook-ceph-config" created
service "rook" created
statefulset "mon" created
pod "rook-tools" created
secret "rook-client-keys" created

Demo

This is what you can expect to see after creating the resources.

Resource overview

Run:

kubectl get all,secrets,configmap --selector app=rook

Output:

NAME            READY     STATUS    RESTARTS   AGE       IP             NODE           LABELS
po/mon-0        1/1       Running   0          34m       10.2.6.5       node3.zbrbdl   app=rook,is_endpoint=true,mon_cluster=rookcluster,role=mon
po/mon-1        1/1       Running   0          34m       10.2.136.189   node2.zbrbdl   app=rook,is_endpoint=true,mon_cluster=rookcluster,role=mon
po/mon-2        1/1       Running   0          34m       10.2.247.29    node1.zbrbdl   app=rook,is_endpoint=true,mon_cluster=rookcluster,role=mon
po/rook-tools   1/1       Running   0          34m       10.2.136.188   node2.zbrbdl   app=rook

NAME       CLUSTER-IP   EXTERNAL-IP   PORT(S)    AGE       SELECTOR                    LABELS
svc/rook   None         <none>        6790/TCP   34m       app=rook,is_endpoint=true   app=rook

NAME               DESIRED   CURRENT   AGE       LABELS
statefulsets/mon   3         3         34m       mon       quay.io/rook/rookd:dev-2017-02-27-k8s   app=rook,is_endpoint=true,mon_cluster=rookcluster,role=mon   app=rook,role=mon,rook_cluster=rookcluster

NAME                       TYPE      DATA      AGE       LABELS
secrets/rook-client-keys   Opaque    2         51m       app=rook

NAME                  DATA      AGE       LABELS
cm/rook-ceph-config   1         34m       app=rook

Verify Monitors bootstrapped successfully

Run:

kubectl exec rook-tools -- ceph -s

Output:

    cluster d0b91bda-032e-451e-863d-812cb3fefee9
     health HEALTH_ERR
            2048 pgs are stuck inactive for more than 300 seconds
            2048 pgs stuck inactive
            2048 pgs stuck unclean
            no osds
     monmap e2: 1 mons at {mon-2=10.2.247.29:6790/0}
            election epoch 4, quorum 0 mon-2
        mgr no daemons active
     osdmap e1: 0 osds: 0 up, 0 in
            flags sortbitwise,require_jewel_osds,require_kraken_osds
      pgmap v2: 2048 pgs, 1 pools, 0 bytes data, 0 objects
            0 kB used, 0 kB / 0 kB avail
                2048 creating

NOTE

You may notice it only sees 1/1 monitor in the quorum when we expect to see 3/3. ceph -s above works and round-robin queries one of three monitors. However, each monitor thinks it's in a cluster of one by itslef. rookd will need to be patched to allow running daemons without providing an initial monitor IP(s), so it falls back on DNS SRV discovery.

Verify config files for client and monitors

Run:

kubectl exec rook-tools -- cat /etc/rook/ceph.conf

Output:

[global]
#mon initial members         = mon-0, mon-1, mon-2
osd crush chooseleaf type   = 1
osd pool default size       = 2

Run:

kubectl exec mon-1 -- cat /var/lib/rook/mon-1/rookcluster.config

Output:

[global]
enable experimental unrecoverable data corrupting features =
fsid                                                       = d0b91bda-032e-451e-863d-812cb3fefee9
run dir                                                    = /var/lib/rook/mon-1
mon initial members                                        = mon-1
log file                                                   = /dev/stdout
mon cluster log file                                       = /dev/stdout
mon keyvaluedb                                             = rocksdb
debug default                                              = 0
debug rados                                                = 0
debug mon                                                  = 0
debug osd                                                  = 0
debug bluestore                                            = 0
debug filestore                                            = 0
debug journal                                              = 0
debug leveldb                                              = 0
filestore_omap_backend                                     = rocksdb
osd pg bits                                                = 11
osd pgp bits                                               = 11
osd pool default size                                      = 2
osd pool default min size                                  = 1
osd pool default pg num                                    = 100
osd pool default pgp num                                   = 100
osd objectstore                                            = filestore
rbd_default_features                                       = 3
crushtool                                                  =
fatal signal handlers                                      = false
osd crush chooseleaf type                                  = 1

[client.admin]
keyring = /var/lib/rook/mon-1/keyring

[mon.mon-1]
name     = mon-1
mon addr = 10.2.136.189:6790

NOTE

[mon.mon-1] section added by rookd. This could be ommitted in this DNS/SRV based configuration.

Cleanup

To remove the resources from Kubernetes run:

kubectl delete -f rook-k8s

Output:

configmap "rook-ceph-config" deleted
secret "rook-client-keys" deleted
service "rook" deleted
statefulset "mon" deleted
pod "rook-tools" deleted
kind: ConfigMap
apiVersion: v1
data:
ceph.conf: |
[global]
osd crush chooseleaf type = 1
osd pool default size = 2
#mon initial members = mon-0, mon-1, mon-2
metadata:
labels:
app: rook
name: rook-ceph-config
apiVersion: v1
data:
admin: QVFEMmxiTlkzV29mSWhBQVVkV3pZZGNraHErYUVMUzJNQ0NRRUE9PQo=
mon: QVFEMmxiTllUc01aSWhBQTBsaTJOTTJMc1QwMG9TV2d1a3plRVE9PQo=
ceph.client.admin.keyring: W2NsaWVudC5hZG1pbl0Ka2V5ID0gQVFEMmxiTlkzV29mSWhBQVVkV3pZZGNraHErYUVMUzJNQ0NRRUE9PQo=
kind: Secret
metadata:
name: rook-client-keys
labels:
app: rook
apiVersion: v1
kind: Service
metadata:
labels:
app: rook
name: rook
spec:
ports:
# Ceph looks for "ceph-mon" DNS SRV records by default
- name: ceph-mon
port: 6790
selector:
app: rook
is_endpoint: "true"
clusterIP: None
apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
labels:
app: rook
role: mon
rook_cluster: rookcluster
name: mon
spec:
serviceName: rook
replicas: 3
template:
metadata:
labels:
app: rook
role: mon
mon_cluster: rookcluster
is_endpoint: "true"
spec:
subdomain: rook
containers:
- command:
- /bin/sh
- -c
- sleep 5;
cp /etc/resolv.conf /tmp/resolv.conf.bak;
sed
'/^search / s/$/ rook.$(POD_NAMESPACE).svc.k8s.zbrbdl/'
/etc/resolv.conf
> /tmp/resolv.conf.new;
cat /tmp/resolv.conf.new > /etc/resolv.conf;
/usr/bin/rookd mon
--name=`hostname -s`
--port=6790
--data-dir=/var/lib/rook
--fsid=d0b91bda-032e-451e-863d-812cb3fefee9
--cluster-name=rookcluster
--ceph-config-override=$(CEPH_CONF)
ports:
- containerPort: 6790
name: ceph-mon
env:
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: CEPH_CONF
value: /etc/rook/ceph.conf
- name: ROOKD_PRIVATE_IPV4
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.podIP
- name: ROOKD_MON_SECRET
valueFrom:
secretKeyRef:
name: rook-client-keys
key: mon
- name: ROOKD_ADMIN_SECRET
valueFrom:
secretKeyRef:
name: rook-client-keys
key: admin
image: quay.io/rook/rookd:dev-2017-02-27-k8s
imagePullPolicy: IfNotPresent
name: mon
volumeMounts:
- mountPath: /etc/rook
name: ceph-config
readOnly: true
restartPolicy: Always
terminationGracePeriodSeconds: 30
volumes:
- name: ceph-config
configMap:
defaultMode: 420
name: rook-ceph-config
apiVersion: v1
kind: Pod
metadata:
name: rook-tools
labels:
app: rook
spec:
containers:
- name: rook-tools
command:
- /bin/sh
- -c
# Hacky script that adds a search domain required for Ceph to find
# DNS SRV records created by Kubernetes.
- cp /etc/resolv.conf /tmp/resolv.conf.bak;
sed
'-e /^search / s/$/ rook.$(POD_NAMESPACE).svc.k8s.zbrbdl/'
/etc/resolv.conf
> /tmp/resolv.conf.new;
cat /tmp/resolv.conf.new > /etc/resolv.conf;
sleep 3600d # Wait for admins to attach
image: quay.io/rook/toolbox:latest
imagePullPolicy: IfNotPresent
args: ["sleep", "36500d"]
env:
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: CEPH_CONF
value: /etc/rook/ceph.conf
volumeMounts:
- mountPath: /etc/rook
name: ceph-config
readOnly: false
- mountPath: /etc/ceph
name: client-keys
readOnly: false
volumes:
- name: ceph-config
configMap:
defaultMode: 420
name: rook-ceph-config
- name: client-keys
secret:
secretName: rook-client-keys
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment