Skip to content

Instantly share code, notes, and snippets.

@beekhof
Last active February 11, 2020 07:56
Show Gist options
  • Save beekhof/33da521a709a3e316a9cfa2f565b0045 to your computer and use it in GitHub Desktop.
Save beekhof/33da521a709a3e316a9cfa2f565b0045 to your computer and use it in GitHub Desktop.

// LeaseSpec: https://github.com/kubernetes/api/blob/master/coordination/v1/types.go#L40

leasePadding = 30 seconds

Unless otherwise specified, when updating the following fields, always set:

  • LeaseDurationSeconds = lifecycle.metal3.io/maintenance + leasePadding,
  • AcquireTime = now,
  • RenewTime = now,
  • HolderIdentity = machine-remediation

A valid lease means: now < LeaseDurationSeconds + AcquireTime

Controller, if lifecycle.metal3.io/maintenance exists:

  • if valid lease AND not the current owner:
    • update lifecycle.metal3.io/maintenance-status = waiting
    • exit
  • if no lease:
    • create with HolderIdentity, LeaseTransitions = 1, AcquireTime, LeaseDurationSeconds
    • Set ownerRef to refer to the Node
    • update lifecycle.metal3.io/maintenance-status = new, create
  • if HolderIdentity doesn’t match:
    • update HolderIdentity, LeaseTransitions + 1, AcquireTime, LeaseDurationSeconds
    • update lifecycle.metal3.io/maintenance-status = new, acquired
    • Ensure lease ownerRef refers to the Node
  • if LeaseDurationSeconds == 0
    • update LeaseTransitions + 1, AcquireTime, LeaseDurationSeconds
    • update lifecycle.metal3.io/maintenance-status = new
  • if LeaseDurationSeconds != lifecycle.metal3.io/maintenance + leasePadding:
    • // lease interval changed
    • update LeaseDurationSeconds, and RenewTime
    • update lifecycle.metal3.io/maintenance-status = updated
  • else if now > (LeaseDurationSeconds + AcquireTime):
    • // The lease records a previous maintenance window
    • // Unlikely but could happen if:
    • // 1. we get API errors preventing the annotation from being deleted prior to the lease expiring
    • // 2. we get API errors preventing LeaseDurationSeconds from being unset and someone recreated the annotation after we deleted it
    • // 3. Reconcile() does not get called within leasePadding of the lease expiring
    • // The second is more likely, so treat as a new request
    • update LeaseDurationSeconds, AcquireTime, and RenewTime
    • update lifecycle.metal3.io/maintenance-status = new, stale
  • if lifecycle.metal3.io/maintenance-status == ended
    • // Someone re-created the annotation, with the same interval, after we deleted it, but before we unset LeaseDurationSeconds, and before the old lease expired
    • // Unlikely but could happen if we get API errors preventing LeaseDurationSeconds from being unset
    • // Treat as a request for a new lease starting now
    • update LeaseDurationSeconds, AcquireTime, and RenewTime
    • update lifecycle.metal3.io/maintenance-status = new, recreate
  • if (now + leasePadding) > (LeaseDurationSeconds + AcquireTime):
    • delete annotation
    • update lifecycle.metal3.io/maintenance-status = ended
    • if lease time remaining > 0, use a retry loop to uncordon for up to “lease time remaining” seconds
    • if lease time remaining still > 0, use a retry loop to cancel drain for up to “lease time remaining” seconds
    • if lease time remaining still > 0, use a retry loop to set LeaseDurationSeconds = 0 for up to “lease time remaining”
    • exit
  • cordon
  • drain
  • update lifecycle.metal3.io/maintenance-status = active

Controller, if lifecycle.metal3.io/maintenance does not exist:

  • if valid lease AND we are the current owner:
    • update lifecycle.metal3.io/maintenance-status = ended
    • if lease time remaining > 0, use a retry loop to uncordon for up to “lease time remaining” seconds
    • if lease time remaining still > 0, use a retry loop to cancel drain for up to “lease time remaining” seconds
    • if lease time remaining still > 0, use a retry loop to set LeaseDurationSeconds = 0 for up to “lease time remaining” seconds
    • on any errors: requeue
  • else if LeaseDurationSeconds > 0 and we are the current owner:
    • // We lost the lease prior to tearing down maintenance mode, probably due to API errors
    • // This could result in manual drains/cordons being undone and may not be a good idea to implement
    • // On the other hand, the node will be useless if it remains cordoned
    • update LeaseDurationSeconds = leasePadding, and RenewTime = now
    • if lease time remaining > 0, use a retry loop to uncordon for up to “lease time remaining” seconds
    • if lease time remaining still > 0, use a retry loop to set LeaseDurationSeconds = 0 for up to “lease time remaining” seconds
    • on any errors: requeue
@MoserMichael
Copy link

MoserMichael commented Feb 11, 2020

f think you will never get to one of the checks:
if lifecycle.metal3.io/maintenance exists:
...
if lifecycle.metal3.io/maintenance-status == ended
because when you transition to ended state you also delete the lifecycle.metal3.io/maintenance annotation. so it won't get to second nested check on a future reconcile call

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment