Skip to content

Instantly share code, notes, and snippets.

@yifan-gu
Last active July 5, 2017 23:58
Show Gist options
  • Select an option

  • Save yifan-gu/0605f84f4f39855e3240415fe635a1b4 to your computer and use it in GitHub Desktop.

Select an option

Save yifan-gu/0605f84f4f39855e3240415fe635a1b4 to your computer and use it in GitHub Desktop.
Notes for Tectonic 1.6.6 upgrade

(Original issue: coreos/tectonic-installer#347)

When upgrading to Tectonic-1.6.6, we will make two additional changes to kube-scheduler and kube-controller-manager manifests besides bumping their image versions:

  • Change the pod anti-affinity from preferredDuringSchedulingIgnoredDuringExecution to requiredDuringSchedulingIgnoredDuringExecution.
  • Make the deployment replica counts = the number of master nodes.

These changes imply that if there is any master node goes down and never comes back during the upgrade, the upgrade won't complete because there's not enough nodes to land the pods.

For example, if the number of master nodes is 5, and the kube-controller-manager (KCM) replica is 2, then during the upgrade, the KCM will be scaled up to 5 replicas. In a normal day, they will be distributed to all master nodes. And on each master node, only 1 of them will be running.

However if a master node goes down due to some reason (as a result it will show up as NotReady in kubectl get nodes), then there will be 1 pod that can't be scheduled due to the pod anti-affinity requirement, so it will get stuck in Pending state and prevent upgrade from proceeding.

Luckily, this doesn't mean upgrading to Tectonic-1.6.6 is more fragile than before, because the daemonset rolling upgrade faces the same issue in previous versions when some node goes down. For more information and questions, please contact team-kube-lifecycle.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment