bizarre-daily-design.md

(not always bizarre, sometimes just backfiring daily design)

Kubernetes

HorizontalPodAutoscaler takes into account Terminating pods when calculating the metrics. The consequence is if a pod has a bug on its termination procedure, terminating pods as part of a scale down triggers a scale up.
- It once did the same for Pending pods.
Can't omit clusterIP if you ever set it (even if to "") https://gitlab.com/gitlab-org/charts/gitlab/issues/1710#note_262760589
How do you force HTTPS redirection on Ambassador: you use a nginx sidecar
How do you force static HTTP responses (401, for example) on Ambassador: you use a nginx sidecar
A removed namespace gets stuck in Terminating state when an apiservice is unavaillable, even if the deletion of the namespace is what made it unavailable. See: kubernetes/kubernetes#60807
kubectl apply -f https://github.com/operator-framework/operator-lifecycle-manager/releases/download/0.15.1/crds.yaml brings your CPU to 100% in jsonmergepatch.CreateThreeWayJSONMergePatch.
UDP on nodeIP breaks when node is restarted: containernetworking/plugins#123 (comment)
resource limits are intended for power-saving not QoS, CFS is broken and may throttle your workload before reaching the limit:

aws eks update-kubeconfig fails with Cluster status not active during a version upgrade. So if you use that on your CI, you are toasted. Reference: aws/aws-cli#3914

Even though some charts use secrets for storing configuration (e.g. kube-prometheus's ), helm doesn't rollback or version secrets: helm/helm#2196
Teams quickly discover they need to customize, validate, audit and re-publish their forked/ generated bundles for their environment. Most packaging solutions to date are tightly coupled to some format written as code (e.g. templates, DSLs, etc). This introduces a number of challenges when trying to extend, build on top of, or integrate them with other systems. For example, how does one update a forked template from upstream, or how does one apply custom validation?

https://opensource.googleblog.com/2020/03/kpt-packaging-up-your-kubernetes.html
Installed and removed operator-lifecycle-manager via helm chart. v1.packages.operators.coreos.com got stuck while deleting the namespace.
If a chart creates a namespace, you can't use store the Helm release on the same namespace. Example: operator-lifecycle-manager.
You have to {{ ... | quote }} everything, otherwise you might confuse Helm and receive errors like Error: YAML parse error on foo.yaml: error converting YAML to JSON: yaml: line XY: did not find expected alphabetic or numeric character when the value is something like *
Helm will always restart all your pods on an upgrade, even if nothing on that pod's template really changed, because there's a helm.sh/chart: ... label containing the chart version.

There's a max message size which puts an upper bound limit on what one can do with helm:

$ helm history <foo>
Error: grpc: trying to send message larger than max (30637757 vs. 20971520)

The centralized stable chart repo being shut down: https://github.com/helm/charts/issues/21103
- A few person weeks hunting them across the org follows.
Helm 3-way merge makes templates not be the authoritative specification of a resource, which causes problems with "immutable resources" (those containing immutable fields):

Abstracting shared functionality by extracting a local module couples all configurations. You cant upgrade one config from 0.12 to 0.13 because it will require the module to also be upgraded, and after that you are forced to upgrade all configurations that use the module to 0.13.
Using data "terraform_remote_state" couples one configuration to another state. Upgrading the referred state from 0.12 to 0.13 will break the depending configuration with Error: state snapshot was created by Terraform v0.13.2, which is newer than current v0.12.29; upgrade to Terraform v0.13.2 or greater to work with this state.
Changing a resource provider might lead to weird errors. For example, the state has an AWS provider in one region and you change the resource to another provider in the same account but different region.

Istio assumes clients (in the mesh) also have a Service. See: istio/istio#14367 (comment)