Skip to content

Instantly share code, notes, and snippets.

@rjhowe
Last active August 31, 2022 21:06
Show Gist options
  • Save rjhowe/47a5587b01787cd4ccc40d51f3e537c9 to your computer and use it in GitHub Desktop.
Save rjhowe/47a5587b01787cd4ccc40d51f3e537c9 to your computer and use it in GitHub Desktop.

ROUGH NOT FULLY TESTED

  1. Create and Attach volume to masters at /dev/vdb via OpenStack

  2. Create test machine-config

apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: master-test
  name: 98-var-etcd
spec:
  config:
    ignition:
      version: 3.1.0
    systemd:
      units:
        - name: var-lib-etcd.mount 
          enabled: true
          contents: |
            [Unit]
            After=mkfs.xfs_vdb.service
            Requires=mkfs.xfs_vdb.service
            [Mount]
            What=/dev/vdb
            Where=/var/lib/etcd
            Type=xfs
            Options=defaults
            [Install]
            WantedBy=local-fs.target
        - name: mkfs.xfs_vdb.service 
          enabled: true
          contents: |
            [Unit]
            Description=oneshot systemd service to XFS format /dev/xvdb device
            After=dev-vdb.device
            Requires=dev-vdb.device            
            [Service]
            Type=oneshot
            #Note the leading "-" in ExecStart. In systemd exec directives this means ignore non-zero exit code.
            ExecStart=-/usr/sbin/mkfs.xfs /dev/vdb            
            [Install]
            WantedBy=local-fs.target
  1. Create test machine-config-pool
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
  name: master-test
spec:
  machineConfigSelector:
    matchExpressions:
    - key: machineconfiguration.openshift.io/role
      operator: In
      values:
      - master
      - master-test 
  nodeSelector:
    matchLabels:
      node-role.kubernetes.io/master-test: ""
  1. Pause master machine config pool
# oc patch --type=merge --patch='{"spec":{"paused":true}}' machineconfigpool/master
  1. Annotate node with the test machine config
NODE=<NODE NAME> 
# oc annotate node ${NODE} machineconfiguration.openshift.io/desiredConfig=`oc get mcp master-test  -o go-template='{{ index .spec.configuration.name }}'`  --overwrite 
  1. Delete Machine Config Daemon on node.
# oc get pods -o wide -n openshift-machine-config-operator | grep ${NODE}
  1. Follow docs to replace etcd: https://docs.openshift.com/container-platform/4.8/backup_and_restore/control_plane_backup_and_restore/replacing-unhealthy-etcd-member.html#restore-replace-crashlooping-etcd-member_replacing-unhealthy-etcd-member
# mkdir /var/lib/etcd-backup
# mv /etc/kubernetes/manifests/etcd-pod.yaml /var/lib/etcd-backup/
# etcdctl member list -w table
# etcdctl member remove xxxxxxxx
oc delete -n openshift-etcd secrets etcd-peer-${NODE} etcd-serving-metrics-${NODE} etcd-serving-${NODE} 
oc patch etcd cluster -p='{"spec": {"forceRedeploymentReason": "single-master-recovery-'"$( date --rfc-3339=ns )"'"}}' --type=merge 
  1. Continue on other masters after checking etcd health.

  2. Once cluster is healthy and all master have been changed over, move to the master mcp.

# oc label mc 98-var-etcd "machineconfiguration.openshift.io/role=master" --overwrite
  1. Unpause master mcp
# oc patch --type=merge --patch='{"spec":{"paused":false}}' machineconfigpool/master
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment