- You had a running rook/ceph cluster, suddenly your Kubernetes environment exploded, you have to start a new Kubernetes environment and put your existing rook/ceph cluster back.
- You are migrating your existing rook/ceph cluster to a new Kubernetes environment, downtime can be tolerated.
In author's situation, etcd data of running Kubernetes cluster is nuked and have no backup, all OSDs are using bluestore backend.
- A working Kubernetes cluster without rook
- Previous rook/ceph cluster is intact, which means you have at least one ceph mon data intact, and all your osd data is intact.
- Start a new and clean rook cluster, with old
CephCluster& friends. - Shut it down when it seems working(as a brandnew cluster).
- Replace ceph-mon data with old one. Fix
fsidin rook. Fix monmap. Disable auth. - Fire it up, watch it resurrect. Fix admin auth key.
- Shut it down again. Enable auth. Fire it up.
HOORAY!
- Assuming your old Kubernetes cluster is completely torned down, and your new Kubernetes cluster is up and running, without rook.
- Backup
/var/lib/rookin all your rook nodes. Backups will be used later. - Pick a
/var/lib/rook/rook-ceph/rook-ceph.configfrom any node and get your old clusterfsidfrom its content. - Remove
/var/lib/rookin all your rook nodes. - Install rook in your new Kubernetes cluster.
- Prepare identical
CephClusterdescriptors, especially identicalspec.storage.configandspec.storage.nodes, exceptmon.count, which sets to1. Post them to your new Kubernetes cluster. - Prepare identical
CephFilesystem& ... etc descriptors (if any). Post them to your new Kubernetes cluster too. - Run
kubectl logs -f rook-ceph-operator-xxxxxxxxxxand wait till all the things are settled. - Run
kubectl get cm/rook-crush-config -o yaml, ensureinitialCrushMapCreatedis set to1. If not, goto 7, manually set it or stop here for further help. - STATE: Now you will have
rook-ceph-mon-a,rook-ceph-mgr-a, and all the auxiliary pods up and running, and zero(hopefully)rook-ceph-osd-Xrunning. Rook should not start any OSD daemon since all devices belongs to your old cluster(have a differentfsid). - Run
kubectl exec -it rook-ceph-mon-a-XXXXXX bashto enter yourceph-monpod,
mon-a# cat /etc/ceph/keyring-store/keyring # save this keyring content, for later use
mon-a# exit- Run
kubectl edit deploy/rook-ceph-operatorand setreplicasto 0. - Run
kubectl delete deploy/Xwhere X is every deployment in namespacerook-ceph, exceptrook-ceph-operatorandrook-ceph-tools.
SSH to the host where rook-ceph-mon-a in your new Kubernetes cluster resides.
- Pick the latest
ceph-mondirectory (/var/lib/rook/mon-?) in your previous backup, replace/var/lib/rook/mon-awith it. - Replace
/var/lib/rook/mon-a/keyringwith the saved keyring, preserving only the[mon.]section, remove[client.admin]section. - Get your
rook-ceph-mon-aaddress bykubectl get cm/rook-ceph-mon-endpoints -o yamlin your new Kubernetes cluster. - Run
docker run -it --rm -v /var/lib/rook:/var/lib/rook ceph/ceph:v14.2.1-20190430 bash(note the docker images version, should match your deployment):
container# cd /var/lib/rook
container# ceph-mon --extract-monmap m --mon-data ./mon-a/data
container# monmaptool --print m
container# monmaptool --rm a m # repeat this until all the old ceph-mons are removed
container# monmaptool --add a 10.77.2.216:6789 m # Replace with your own rook-ceph-mon address!
container# ceph-mon --inject-monmap m --mon-data ./mon-a/data
container# rm m
container# exitNow back to your local machine.
- Run
kubectl edit secret/rook-ceph-monand modifyfsidto your originalfsid - Run
kubectl edit cm/rook-config-overrideadd content below:
data:
config: |
[global]
auth cluster required = none
auth service required = none
auth client required = none
auth supported = none
- Run
kubectl edit deploy/rook-ceph-operatorand setreplicato 1. - Run
kubectl logs -f rook-ceph-operator-xxxxxxxxxxand wait till all the things are settled. - STATE: Now your rook/ceph cluster should be up and running, with authentication disabled.
- Run
kubectl exec -it rook-ceph-tools-XXXXXXX bashto enter tools pod:
tools# vi key
[paste keyring content saved before, preserving only `[client admin]` section]
tools# ceph auth import -i key
tools# rm key- Run
kubectl edit cm/rook-config-overrideand remove previously added configurations. - Run
kubectl edit deploy/rook-ceph-operatorand setreplicasto 0. - Run
kubectl delete deploy/Xwhere X is every deployment in namespacerook-ceph, exceptrook-ceph-operatorandrook-ceph-tools, again. This time OSD daemons are present and included. - Run
kubectl edit deploy/rook-ceph-operatorand setreplicasto 1. - Run
kubectl logs -f rook-ceph-operator-xxxxxxxxxxand wait till all the things are settled.
HOORAY!