jeesmon’s gists

jeesmon / controller-manager-startup-steps.md

Created February 28, 2022 13:38

Wondered what happens when controller manager starts in your operator?

Start metrics endpoint
Start health probes
Start web hooks servers
Start and wait for caches to sync
Start non-leader election runnables
Start leader election and all required runnables

You can see that caches are sync’ed in both leader and non-leader operator pods which makes transition quick if leader pod exits and another operator pod becomes leader (if you run multiple pods of your operator for HA)

jeesmon / controller-scale-up-decision.md

Created February 28, 2022 13:45

Why we don’t need to scale up controller of a CRD (add multiple instances of the operator)?

All core controllers in Kubernetes run single instance. If Kubernetes can do it, our operator can do too as long as we write our code carefully. Anything that is going to overload a well-written controller is also very likely to overload etcd itself. Controllers should never be doing heavy work themselves, they just orchestrate and control. Control part is usually quite fast, complicated but not CPU intensive itself. The operator control things but the actual heavy lifting (for example, DB migration by operator) should happen elsewhere in a Job or similar. Some heavy lifting can also be divided up like for example “call a create API and then wait for it to finish”, then return immediately after calling the API with RequeueAfter a delay. So it is not sitting there blocking on something slow.

Credit: @coderanger in kubebuilder slack channel

jeesmon / controller-reconciliation.md

Created February 28, 2022 13:49

Reconciliation is level-based, meaning action isn’t driven off changes in individual Events, but instead is driven by actual cluster state read from the apiserver or a local cache. For example if responding to a Pod Delete Event, the Request won’t contain that a Pod was deleted, instead the reconcile function observes this when reading the cluster state and seeing the Pod as missing.

More on what level-based means from external slack (best explanation that I have seen so far)

coderanger:

It's a bit of jargon from electronics that got carried over into computer stuff 🙂

If you want to make a reactive system there's two main approaches, you can watch for changes and then do something in response to those changes. For example when you click a button, the app does something. That is called "edge based" because in electronics you would see a rising or falling voltage change as the thing you are paying attention to

jeesmon / kubernetes-operator-links.md

Last active August 18, 2022 17:39

Kubebuilder Book: https://book.kubebuilder.io/
Operator Whitepaper: https://github.com/cncf/tag-app-delivery/blob/main/operator-wg/whitepaper/Operator-WhitePaper_v1-0.md
Controller internals; https://engineering.bitnami.com/articles/a-deep-dive-into-kubernetes-controllers.html
Best practices: https://cloud.google.com/blog/products/containers-kubernetes/best-practices-for-building-kubernetes-operators-and-stateful-apps
Operator Ebook: https://developers.redhat.com/books/kubernetes-operators
IBM DataPower Operator: https://ibm.github.io/datapower-operator-doc/
Watch API in kubernetes: https://www.youtube.com/watch?v=PLSDvFjR9HY
Making the case for Kubernetes operators: https://practicalkubernetes.blogspot.com/2022/01/making-case-for-kubernetes-operators.html
https://www.youtube.com/watch?v=KBTXBUVNF2I

jeesmon / go-links.md

Created February 28, 2022 14:07

Generics: https://github.com/akutz/go-generics-the-hard-way
Pointer package: https://pkg.go.dev/k8s.io/utils/pointer
Diff kube objects: https://pkg.go.dev/github.com/google/go-cmp/cmp

jeesmon / kube-tools.md

Created February 28, 2022 14:07

Kubescape: https://github.com/armosec/kubescape
g: https://github.com/stefanmaric/g
Reloader: https://github.com/stakater/Reloader
Kyverno: https://kyverno.io/
Kubeplus: https://github.com/cloud-ark/kubeplus
stern: https://github.com/wercker/stern

jeesmon / kube-commands.md

Last active February 28, 2022 14:16

Show container runtime

oc get no -o custom-columns=NAME:.metadata.name,CONTAINER-RUNTIME:.status.nodeInfo.containerRuntimeVersion

OR

oc get no -o wide

jeesmon / kubectl-with-sa-token.md

Last active July 19, 2022 19:27

Ever needed an option to run oc or kubectl command from within a pod in the cluster with proper permissions and without hard coding your (short-lived) token? With right RBAC, you can do the authn for oc/kubectl using your service account token. This token will be automatically mounted on the pod together with CA cert and you can login to oc/kubectl like this:

oc login --token="$(cat /var/run/secrets/kubernetes.io/serviceaccount/token)" \
  --server='https://kubernetes.default' \
  --certificate-authority='/var/run/secrets/kubernetes.io/serviceaccount/ca.crt'

Another option:

jeesmon / operator-subscription-config.md

Created February 28, 2022 14:13

You can configure operator deployed by OLM using Subscription Config. This will be handy for cluster administrators if there is a need to tweak the operator deployment based on cluster state. For example, you can update resource requests/limits of an OOMKilled operator pod, add a nodeSelector for operator, etc.

More details are here: https://github.com/operator-framework/operator-lifecycle-manager/blob/master/doc/design/subscription-config.md

jeesmon / openshift-docs.md

Created February 28, 2022 14:17

Install specific version of an operator https://docs.openshift.com/container-platform/4.8/operators/admin/olm-adding-operators-to-cluster.html#olm-installing-specific-version-cli_olm-adding-operators-to-a-cluster
Graceful shutdown/restart of OCP cluster: https://docs.openshift.com/container-platform/4.7/backup_and_restore/graceful-cluster-shutdown.html https://docs.openshift.com/container-platform/4.7/backup_and_restore/graceful-cluster-restart.html
Disaster recovery steps https://docs.openshift.com/container-platform/4.7/backup_and_restore/disaster_recovery/about-disaster-recovery.html
Recovering from expired controlplane certificates https://docs.openshift.com/container-platform/4.7/backup_and_restore/disaster_recovery/scenario-3-expired-certs.html
Debugging OpenShift web console, OperatorHub, internal registry, and other components

Jeesmon Jacob jeesmon