Last active
June 11, 2023 22:37
-
-
Save derrickburns/b2d5c884edeb82b72269c35a256bda2a to your computer and use it in GitHub Desktop.
Identify K8s Deployments without HA enabled
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
I then ran a rather simple script to identify whether each Deployment resource has the necessary (though not sufficient!) configuration to support a high availability deployment, by which I mean: | |
more than one replica, | |
an anti-affinity rule, | |
a liveness probe, and | |
a readiness probe. | |
I did not review the actual content returned, so it is entirely possible that there are not enough replicas, the anti-affinity rule is incorrect, or one of the probes is incorrect. | |
However, this cursory audit reveals that of the 270 total Deployment resources, only 20 have anti-affinity rules. Of these 20, only 18 have more than one replica. Of these 18, 16 have readiness probes, 15 have liveness probes, and 14 have both readiness and liveness probes. | |
At best, only 5.56% of our production deployments (15 of 270) are configured to withstand a single node going down. | |
Further inspection (using this script) reveals that our anti-affinity rules are insufficient to spread pods across availability zones. | |
For example, here is the anti-affinity rule for anypoint-mq: | |
affinity: | |
podAntiAffinity: | |
requiredDuringSchedulingIgnoredDuringExecution: | |
- labelSelector: | |
matchExpressions: | |
- key: app | |
operator: In | |
values: | |
- {{ $nameOverride }} | |
topologyKey: "kubernetes.io/hostname" | |
This uses the topology key kubernetes.io/hostname, which would distribute pods across different hosts by name. | |
By contrast, to distribute pods by availability zone, one would need to use a topology key based on availability zone such as failure-domain.beta.kubernetes.io/zone. | |
None of the Deployments is configured to use such a topology key, so none is configured withstand the loss of an entire availability zone. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment