// LeaseSpec: https://github.com/kubernetes/api/blob/master/coordination/v1/types.go#L40
leasePadding = 30 seconds
Unless otherwise specified, when updating the following fields, always set:
LeaseDurationSeconds = lifecycle.metal3.io/maintenance + leasePadding
,AcquireTime = now
,RenewTime = now
,HolderIdentity = machine-remediation
A valid lease means: now < LeaseDurationSeconds + AcquireTime
Controller, if lifecycle.metal3.io/maintenance
exists:
- if valid lease AND not the current owner:
- update
lifecycle.metal3.io/maintenance-status = waiting
- exit
- update
- if no lease:
- create with HolderIdentity, LeaseTransitions = 1, AcquireTime, LeaseDurationSeconds
- Set ownerRef to refer to the Node
- update
lifecycle.metal3.io/maintenance-status = new, create
- if HolderIdentity doesn’t match:
- update HolderIdentity, LeaseTransitions + 1, AcquireTime, LeaseDurationSeconds
- update
lifecycle.metal3.io/maintenance-status = new, acquired
- Ensure lease ownerRef refers to the Node
- if LeaseDurationSeconds == 0
- update LeaseTransitions + 1, AcquireTime, LeaseDurationSeconds
- update
lifecycle.metal3.io/maintenance-status = new
- if
LeaseDurationSeconds
!=lifecycle.metal3.io/maintenance + leasePadding
:- // lease interval changed
- update
LeaseDurationSeconds
, andRenewTime
- update
lifecycle.metal3.io/maintenance-status = updated
- else if now >
(LeaseDurationSeconds + AcquireTime)
:- // The lease records a previous maintenance window
- // Unlikely but could happen if:
- // 1. we get API errors preventing the annotation from being deleted prior to the lease expiring
- // 2. we get API errors preventing LeaseDurationSeconds from being unset and someone recreated the annotation after we deleted it
- // 3. Reconcile() does not get called within leasePadding of the lease expiring
- // The second is more likely, so treat as a new request
- update
LeaseDurationSeconds
,AcquireTime
, andRenewTime
- update
lifecycle.metal3.io/maintenance-status = new, stale
- if
lifecycle.metal3.io/maintenance-status == ended
- // Someone re-created the annotation, with the same interval, after we deleted it, but before we unset LeaseDurationSeconds, and before the old lease expired
- // Unlikely but could happen if we get API errors preventing LeaseDurationSeconds from being unset
- // Treat as a request for a new lease starting now
- update
LeaseDurationSeconds
,AcquireTime
, andRenewTime
- update
lifecycle.metal3.io/maintenance-status = new, recreate
- if
(now + leasePadding) > (LeaseDurationSeconds + AcquireTime)
:- delete annotation
- update
lifecycle.metal3.io/maintenance-status = ended
- if lease time remaining > 0, use a retry loop to uncordon for up to “lease time remaining” seconds
- if lease time remaining still > 0, use a retry loop to cancel drain for up to “lease time remaining” seconds
- if lease time remaining still > 0, use a retry loop to set
LeaseDurationSeconds = 0
for up to “lease time remaining” - exit
- cordon
- drain
- update
lifecycle.metal3.io/maintenance-status = active
Controller, if lifecycle.metal3.io/maintenance
does not exist:
- if valid lease AND we are the current owner:
- update
lifecycle.metal3.io/maintenance-status = ended
- if lease time remaining > 0, use a retry loop to uncordon for up to “lease time remaining” seconds
- if lease time remaining still > 0, use a retry loop to cancel drain for up to “lease time remaining” seconds
- if lease time remaining still > 0, use a retry loop to set
LeaseDurationSeconds = 0
for up to “lease time remaining” seconds - on any errors: requeue
- update
- else if
LeaseDurationSeconds > 0
and we are the current owner:- // We lost the lease prior to tearing down maintenance mode, probably due to API errors
- // This could result in manual drains/cordons being undone and may not be a good idea to implement
- // On the other hand, the node will be useless if it remains cordoned
- update
LeaseDurationSeconds = leasePadding
, andRenewTime = now
- if lease time remaining > 0, use a retry loop to uncordon for up to “lease time remaining” seconds
- if lease time remaining still > 0, use a retry loop to set
LeaseDurationSeconds = 0
for up to “lease time remaining” seconds - on any errors: requeue
f think you will never get to one of the checks:
if lifecycle.metal3.io/maintenance exists:
...
if lifecycle.metal3.io/maintenance-status == ended
because when you transition to ended state you also delete the lifecycle.metal3.io/maintenance annotation. so it won't get to second nested check on a future reconcile call