K8s cluster running on GCP that currently consists entirely of preemtible nodes. We're experiencing issues where kube-dns becomes unavailable (presumably because a node has been preempted). We'd like to improve the resilience of DNS by moving kube-dns pods to more stable nodes.
Objective:
Aim to to schedule system cluster critical pods like kube-dns (or all pods in the kube-system namespace) on a node pool of only non-preemptible nodes.
The solution was to use taints and tolerations in conjunction with node affinity. We created a second node pool, and added a taint to the preemptible pool.
Terraform config:
resource "google_container_node_pool" "preemptible_worker_pool" {
node_config {
...
preemptible = true
labels {
preemptible = "true"
dedicated = "preemptible-worker-pool"
}
taint {
key = "dedicated"
value = "preemptible-worker-pool"
effect = "NO_SCHEDULE"
}
}
}
We then used a toleration and nodeAffinity to allow our existing workloads to run on the tainted node pool, effectively forcing the cluster-critical pods to run on the untainted (non-preemtible) node pool.
Kubernetes config:
spec:
template:
spec:
# The affinity + tolerations sections together allow and enforce that the workers are
# run on dedicated nodes tainted with "dedicated=preemptible-worker-pool:NoSchedule".
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: dedicated
operator: In
values:
- preemptible-worker-pool
tolerations:
- key: dedicated
operator: "Equal"
value: preemptible-worker-pool
effect: "NoSchedule"