(not always bizarre, sometimes just backfiring daily design)
HorizontalPodAutoscaler
takes into accountTerminating
pods when calculating the metrics. The consequence is if a pod has a bug on its termination procedure, terminating pods as part of a scale down triggers a scale up.- It once did the same for
Pending
pods.
- It once did the same for
- Can't omit
clusterIP
if you ever set it (even if to "") https://gitlab.com/gitlab-org/charts/gitlab/issues/1710#note_262760589 - How do you force HTTPS redirection on Ambassador: you use a nginx sidecar
- How do you force static HTTP responses (401, for example) on Ambassador: you use a nginx sidecar
- A removed namespace gets stuck in
Terminating
state when anapiservice
is unavaillable, even if the deletion of the namespace is what made it unavailable. See: kubernetes/kubernetes#60807 kubectl apply -f https://github.com/operator-framework/operator-lifecycle-manager/releases/download/0.15.1/crds.yaml
brings your CPU to 100% injsonmergepatch.CreateThreeWayJSONMergePatch
.- UDP on nodeIP breaks when node is restarted: containernetworking/plugins#123 (comment)
- resource
limits
are intended for power-saving not QoS, CFS is broken and may throttle your workload before reaching the limit:
aws eks update-kubeconfig
fails withCluster status not active
during a version upgrade. So if you use that on your CI, you are toasted. Reference: aws/aws-cli#3914
- Even though some charts use secrets for storing configuration (e.g.
kube-prometheus
's ),helm
doesn't rollback or version secrets: helm/helm#2196 -
Teams quickly discover they need to customize, validate, audit and re-publish their forked/ generated bundles for their environment. Most packaging solutions to date are tightly coupled to some format written as code (e.g. templates, DSLs, etc). This introduces a number of challenges when trying to extend, build on top of, or integrate them with other systems. For example, how does one update a forked template from upstream, or how does one apply custom validation?
https://opensource.googleblog.com/2020/03/kpt-packaging-up-your-kubernetes.html
- Installed and removed
operator-lifecycle-manager
via helm chart.v1.packages.operators.coreos.com
got stuck while deleting the namespace. - If a chart creates a namespace, you can't use store the Helm release on the same namespace. Example:
operator-lifecycle-manager
. - You have to
{{ ... | quote }}
everything, otherwise you might confuse Helm and receive errors likeError: YAML parse error on foo.yaml: error converting YAML to JSON: yaml: line XY: did not find expected alphabetic or numeric character
when the value is something like*
- Helm will always restart all your pods on an upgrade, even if nothing on that pod's template really changed, because there's a
helm.sh/chart: ...
label containing the chart version. - There's a max message size which puts an upper bound limit on what one can do with helm:
$ helm history <foo> Error: grpc: trying to send message larger than max (30637757 vs. 20971520)
- The centralized
stable
chart repo being shut down: https://github.com/helm/charts/issues/21103- A few person weeks hunting them across the org follows.
- Helm 3-way merge makes templates not be the authoritative specification of a resource, which causes problems with "immutable resources" (those containing immutable fields):
- Abstracting shared functionality by extracting a local module couples all configurations. You cant upgrade one config from 0.12 to 0.13 because it will require the module to also be upgraded, and after that you are forced to upgrade all configurations that use the module to 0.13.
- Using
data "terraform_remote_state"
couples one configuration to another state. Upgrading the referred state from 0.12 to 0.13 will break the depending configuration withError: state snapshot was created by Terraform v0.13.2, which is newer than current v0.12.29; upgrade to Terraform v0.13.2 or greater to work with this state
. - Changing a resource provider might lead to weird errors. For example, the state has an AWS provider in one region and you change the resource to another provider in the same account but different region.
- Istio assumes clients (in the mesh) also have a Service. See: istio/istio#14367 (comment)