- You had a running rook/ceph cluster, suddenly your Kubernetes environment exploded, you have to start a new Kubernetes environment and put your existing rook/ceph cluster back.
- You are migrating your existing rook/ceph cluster to a new Kubernetes environment, downtime can be tolerated.
In author's situation, etcd
data of running Kubernetes cluster is nuked and have no backup, all OSDs are using bluestore backend.
- A working Kubernetes cluster without rook
- Previous rook/ceph cluster is intact, which means you have at least one ceph mon data intact, and all your osd data is intact.
- Start a new and clean rook cluster, with old
CephCluster
& friends. - Shut it down when it seems working(as a brandnew cluster).
- Replace ceph-mon data with old one. Fix
fsid
in rook. Fix monmap. Disable auth. - Fire it up, watch it resurrect. Fix admin auth key.
- Shut it down again. Enable auth. Fire it up.
HOORAY!
- Assuming your old Kubernetes cluster is completely torned down, and your new Kubernetes cluster is up and running, without rook.
- Backup
/var/lib/rook
in all your rook nodes. Backups will be used later. - Pick a
/var/lib/rook/rook-ceph/rook-ceph.config
from any node and get your old clusterfsid
from its content. - Remove
/var/lib/rook
in all your rook nodes. - Install rook in your new Kubernetes cluster.
- Prepare identical
CephCluster
descriptors, especially identicalspec.storage.config
andspec.storage.nodes
, exceptmon.count
, which sets to1
. Post them to your new Kubernetes cluster. - Prepare identical
CephFilesystem
& ... etc descriptors (if any). Post them to your new Kubernetes cluster too. - Run
kubectl logs -f rook-ceph-operator-xxxxxxxxxx
and wait till all the things are settled. - Run
kubectl get cm/rook-crush-config -o yaml
, ensureinitialCrushMapCreated
is set to1
. If not, goto 7, manually set it or stop here for further help. - STATE: Now you will have
rook-ceph-mon-a
,rook-ceph-mgr-a
, and all the auxiliary pods up and running, and zero(hopefully)rook-ceph-osd-X
running. Rook should not start any OSD daemon since all devices belongs to your old cluster(have a differentfsid
). - Run
kubectl exec -it rook-ceph-mon-a-XXXXXX bash
to enter yourceph-mon
pod,
mon-a# cat /etc/ceph/keyring-store/keyring # save this keyring content, for later use
mon-a# exit
- Run
kubectl edit deploy/rook-ceph-operator
and setreplicas
to 0. - Run
kubectl delete deploy/X
where X is every deployment in namespacerook-ceph
, exceptrook-ceph-operator
androok-ceph-tools
.
SSH to the host where rook-ceph-mon-a
in your new Kubernetes cluster resides.
- Pick the latest
ceph-mon
directory (/var/lib/rook/mon-?
) in your previous backup, replace/var/lib/rook/mon-a
with it. - Replace
/var/lib/rook/mon-a/keyring
with the saved keyring, preserving only the[mon.]
section, remove[client.admin]
section. - Get your
rook-ceph-mon-a
address bykubectl get cm/rook-ceph-mon-endpoints -o yaml
in your new Kubernetes cluster. - Run
docker run -it --rm -v /var/lib/rook:/var/lib/rook ceph/ceph:v14.2.1-20190430 bash
(note the docker images version, should match your deployment):
container# cd /var/lib/rook
container# ceph-mon --extract-monmap m --mon-data ./mon-a/data
container# monmaptool --print m
container# monmaptool --rm a m # repeat this until all the old ceph-mons are removed
container# monmaptool --add a 10.77.2.216:6789 m # Replace with your own rook-ceph-mon address!
container# ceph-mon --inject-monmap m --mon-data ./mon-a/data
container# rm m
container# exit
Now back to your local machine.
- Run
kubectl edit secret/rook-ceph-mon
and modifyfsid
to your originalfsid
- Run
kubectl edit cm/rook-config-override
add content below:
data:
config: |
[global]
auth cluster required = none
auth service required = none
auth client required = none
auth supported = none
- Run
kubectl edit deploy/rook-ceph-operator
and setreplica
to 1. - Run
kubectl logs -f rook-ceph-operator-xxxxxxxxxx
and wait till all the things are settled. - STATE: Now your rook/ceph cluster should be up and running, with authentication disabled.
- Run
kubectl exec -it rook-ceph-tools-XXXXXXX bash
to enter tools pod:
tools# vi key
[paste keyring content saved before, preserving only `[client admin]` section]
tools# ceph auth import -i key
tools# rm key
- Run
kubectl edit cm/rook-config-override
and remove previously added configurations. - Run
kubectl edit deploy/rook-ceph-operator
and setreplicas
to 0. - Run
kubectl delete deploy/X
where X is every deployment in namespacerook-ceph
, exceptrook-ceph-operator
androok-ceph-tools
, again. This time OSD daemons are present and included. - Run
kubectl edit deploy/rook-ceph-operator
and setreplicas
to 1. - Run
kubectl logs -f rook-ceph-operator-xxxxxxxxxx
and wait till all the things are settled.
HOORAY!