Notes for Tectonic 1.6.6 upgrade

(Original issue: coreos/tectonic-installer#347)

When upgrading to Tectonic-1.6.6, we will make two additional changes to kube-scheduler and kube-controller-manager manifests besides bumping their image versions:

Change the pod anti-affinity from preferredDuringSchedulingIgnoredDuringExecution to requiredDuringSchedulingIgnoredDuringExecution.
Make the deployment replica counts = the number of master nodes.

These changes imply that if there is any master node goes down and never comes back during the upgrade, the upgrade won't complete because there's not enough nodes to land the pods.

For example, if the number of master nodes is 5, and the kube-controller-manager (KCM) replica is 2, then during the upgrade, the KCM will be scaled up to 5 replicas. In a normal day, they will be distributed to all master nodes. And on each master node, only 1 of them will be running.

However if a master node goes down due to some reason (as a result it will show up as NotReady in kubectl get nodes), then there will be 1 pod that can't be scheduled due to the pod anti-affinity requirement, so it will get stuck in Pending state and prevent upgrade from proceeding.

Luckily, this doesn't mean upgrading to Tectonic-1.6.6 is more fragile than before, because the daemonset rolling upgrade faces the same issue in previous versions when some node goes down. For more information and questions, please contact team-kube-lifecycle.

yifan-gu/upgrade-1.6.6-notes.md

Select an option

No results found

Select an option

No results found