-
Create and Attach volume to masters at /dev/vdb via OpenStack
-
Create test machine-config
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
labels:
machineconfiguration.openshift.io/role: master-test
name: 98-var-etcd
spec:
config:
ignition:
version: 3.1.0
systemd:
units:
- name: var-lib-etcd.mount
enabled: true
contents: |
[Unit]
After=mkfs.xfs_vdb.service
Requires=mkfs.xfs_vdb.service
[Mount]
What=/dev/vdb
Where=/var/lib/etcd
Type=xfs
Options=defaults
[Install]
WantedBy=local-fs.target
- name: mkfs.xfs_vdb.service
enabled: true
contents: |
[Unit]
Description=oneshot systemd service to XFS format /dev/xvdb device
After=dev-vdb.device
Requires=dev-vdb.device
[Service]
Type=oneshot
#Note the leading "-" in ExecStart. In systemd exec directives this means ignore non-zero exit code.
ExecStart=-/usr/sbin/mkfs.xfs /dev/vdb
[Install]
WantedBy=local-fs.target
- Create test machine-config-pool
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
name: master-test
spec:
machineConfigSelector:
matchExpressions:
- key: machineconfiguration.openshift.io/role
operator: In
values:
- master
- master-test
nodeSelector:
matchLabels:
node-role.kubernetes.io/master-test: ""
- Pause master machine config pool
# oc patch --type=merge --patch='{"spec":{"paused":true}}' machineconfigpool/master
- Annotate node with the test machine config
NODE=<NODE NAME>
# oc annotate node ${NODE} machineconfiguration.openshift.io/desiredConfig=`oc get mcp master-test -o go-template='{{ index .spec.configuration.name }}'` --overwrite
- Delete Machine Config Daemon on node.
# oc get pods -o wide -n openshift-machine-config-operator | grep ${NODE}
- Follow docs to replace etcd: https://docs.openshift.com/container-platform/4.8/backup_and_restore/control_plane_backup_and_restore/replacing-unhealthy-etcd-member.html#restore-replace-crashlooping-etcd-member_replacing-unhealthy-etcd-member
# mkdir /var/lib/etcd-backup
# mv /etc/kubernetes/manifests/etcd-pod.yaml /var/lib/etcd-backup/
# etcdctl member list -w table
# etcdctl member remove xxxxxxxx
oc delete -n openshift-etcd secrets etcd-peer-${NODE} etcd-serving-metrics-${NODE} etcd-serving-${NODE}
oc patch etcd cluster -p='{"spec": {"forceRedeploymentReason": "single-master-recovery-'"$( date --rfc-3339=ns )"'"}}' --type=merge
-
Continue on other masters after checking etcd health.
-
Once cluster is healthy and all master have been changed over, move to the master mcp.
# oc label mc 98-var-etcd "machineconfiguration.openshift.io/role=master" --overwrite
- Unpause master mcp
# oc patch --type=merge --patch='{"spec":{"paused":false}}' machineconfigpool/master