// LeaseSpec: https://github.com/kubernetes/api/blob/master/coordination/v1/types.go#L40
leasePadding = 30 seconds
Unless otherwise specified, when updating the following fields, always set:
LeaseDurationSeconds = lifecycle.metal3.io/maintenance + leasePadding
,AcquireTime = now
,RenewTime = now
,HolderIdentity = machine-remediation
A valid lease means: now < LeaseDurationSeconds + AcquireTime
Controller, if lifecycle.metal3.io/maintenance
exists:
- if valid lease AND not the current owner:
- update
lifecycle.metal3.io/maintenance-status = waiting
- exit
- update
- if no lease:
- create with HolderIdentity, LeaseTransitions = 1, AcquireTime, LeaseDurationSeconds
- Set ownerRef to refer to the Node
- update
lifecycle.metal3.io/maintenance-status = new, create
- if HolderIdentity doesn’t match:
- update HolderIdentity, LeaseTransitions + 1, AcquireTime, LeaseDurationSeconds
- update
lifecycle.metal3.io/maintenance-status = new, acquired
- Ensure lease ownerRef refers to the Node
- if LeaseDurationSeconds == 0
- update LeaseTransitions + 1, AcquireTime, LeaseDurationSeconds
- update
lifecycle.metal3.io/maintenance-status = new
- if
LeaseDurationSeconds
!=lifecycle.metal3.io/maintenance + leasePadding
:- // lease interval changed
- update
LeaseDurationSeconds
, andRenewTime
- update
lifecycle.metal3.io/maintenance-status = updated
- else if now >
(LeaseDurationSeconds + AcquireTime)
:- // The lease records a previous maintenance window
- // Unlikely but could happen if:
- // 1. we get API errors preventing the annotation from being deleted prior to the lease expiring
- // 2. we get API errors preventing LeaseDurationSeconds from being unset and someone recreated the annotation after we deleted it
- // 3. Reconcile() does not get called within leasePadding of the lease expiring
- // The second is more likely, so treat as a new request
- update
LeaseDurationSeconds
,AcquireTime
, andRenewTime
- update
lifecycle.metal3.io/maintenance-status = new, stale
- if
lifecycle.metal3.io/maintenance-status == ended
- // Someone re-created the annotation, with the same interval, after we deleted it, but before we unset LeaseDurationSeconds, and before the old lease expired
- // Unlikely but could happen if we get API errors preventing LeaseDurationSeconds from being unset
- // Treat as a request for a new lease starting now
- update
LeaseDurationSeconds
,AcquireTime
, andRenewTime
- update
lifecycle.metal3.io/maintenance-status = new, recreate
- if
(now + leasePadding) > (LeaseDurationSeconds + AcquireTime)
:- delete annotation
- update
lifecycle.metal3.io/maintenance-status = ended
- if lease time remaining > 0, use a retry loop to uncordon for up to “lease time remaining” seconds
- if lease time remaining still > 0, use a retry loop to cancel drain for up to “lease time remaining” seconds
- if lease time remaining still > 0, use a retry loop to set
LeaseDurationSeconds = 0
for up to “lease time remaining” - exit
- cordon
- drain
- update
lifecycle.metal3.io/maintenance-status = active
Controller, if lifecycle.metal3.io/maintenance
does not exist:
- if valid lease AND we are the current owner:
- update
lifecycle.metal3.io/maintenance-status = ended
- if lease time remaining > 0, use a retry loop to uncordon for up to “lease time remaining” seconds
- if lease time remaining still > 0, use a retry loop to cancel drain for up to “lease time remaining” seconds
- if lease time remaining still > 0, use a retry loop to set
LeaseDurationSeconds = 0
for up to “lease time remaining” seconds - on any errors: requeue
- update
- else if
LeaseDurationSeconds > 0
and we are the current owner:- // We lost the lease prior to tearing down maintenance mode, probably due to API errors
- // This could result in manual drains/cordons being undone and may not be a good idea to implement
- // On the other hand, the node will be useless if it remains cordoned
- update
LeaseDurationSeconds = leasePadding
, andRenewTime = now
- if lease time remaining > 0, use a retry loop to uncordon for up to “lease time remaining” seconds
- if lease time remaining still > 0, use a retry loop to set
LeaseDurationSeconds = 0
for up to “lease time remaining” seconds - on any errors: requeue
yes, i'm not happy about it either - it's this or create a CRD which we are still trying to avoid.
for now I'm tell myself its ok because its only relevant if an unlikely series of events occur
any time you're updating
AcquireTime
andRenewTime
you should set it tonow
LeaseDurationSeconds
is alwayslifecycle.metal3.io/maintenance + leasePadding