- Generics: https://github.com/akutz/go-generics-the-hard-way
- Pointer package: https://pkg.go.dev/k8s.io/utils/pointer
- Diff kube objects: https://pkg.go.dev/github.com/google/go-cmp/cmp
-
Kubebuilder Book: https://book.kubebuilder.io/
-
Operator Whitepaper: https://github.com/cncf/tag-app-delivery/blob/main/operator-wg/whitepaper/Operator-WhitePaper_v1-0.md
-
Controller internals; https://engineering.bitnami.com/articles/a-deep-dive-into-kubernetes-controllers.html
-
Best practices: https://cloud.google.com/blog/products/containers-kubernetes/best-practices-for-building-kubernetes-operators-and-stateful-apps
-
Operator Ebook: https://developers.redhat.com/books/kubernetes-operators
-
IBM DataPower Operator: https://ibm.github.io/datapower-operator-doc/
-
Watch API in kubernetes: https://www.youtube.com/watch?v=PLSDvFjR9HY
-
Making the case for Kubernetes operators: https://practicalkubernetes.blogspot.com/2022/01/making-case-for-kubernetes-operators.html
Reconciliation is level-based, meaning action isn’t driven off changes in individual Events, but instead is driven by actual cluster state read from the apiserver or a local cache. For example if responding to a Pod Delete Event, the Request won’t contain that a Pod was deleted, instead the reconcile function observes this when reading the cluster state and seeing the Pod as missing.
More on what level-based
means from external slack (best explanation that I have seen so far)
coderanger:
It's a bit of jargon from electronics that got carried over into computer stuff 🙂
If you want to make a reactive system there's two main approaches, you can watch for changes and then do something in response to those changes. For example when you click a button, the app does something. That is called "edge based" because in electronics you would see a rising or falling voltage change as the thing you are paying attention to
Why we don’t need to scale up controller of a CRD (add multiple instances of the operator)?
All core controllers in Kubernetes run single instance. If Kubernetes can do it, our operator can do too as long as we write our code carefully. Anything that is going to overload a well-written controller is also very likely to overload etcd itself. Controllers should never be doing heavy work themselves, they just orchestrate and control. Control part is usually quite fast, complicated but not CPU intensive itself. The operator control things but the actual heavy lifting (for example, DB migration by operator) should happen elsewhere in a Job or similar. Some heavy lifting can also be divided up like for example “call a create API and then wait for it to finish”, then return immediately after calling the API with RequeueAfter a delay. So it is not sitting there blocking on something slow.
Credit: @coderanger in kubebuilder slack channel
Wondered what happens when controller manager starts in your operator?
- Start metrics endpoint
- Start health probes
- Start web hooks servers
- Start and wait for caches to sync
- Start non-leader election runnables
- Start leader election and all required runnables
You can see that caches are sync’ed in both leader and non-leader operator pods which makes transition quick if leader pod exits and another operator pod becomes leader (if you run multiple pods of your operator for HA)
More advanced use case, but if you ever wanted to write operators to manage lifecycle of multiple kube clusters, cluster-api is something you want to look into: https://cluster-api.sigs.k8s.io
Goals:
- To manage the lifecycle (create, scale, upgrade, destroy) of Kubernetes-conformant clusters using a declarative API.
- To work in different environments, both on-premises and in the cloud.
- To define common operations, provide a default implementation, and provide the ability to swap out implementations for alternative ones.
- To reuse and integrate existing ecosystem components rather than duplicating their functionality (e.g. node-problem-detector, cluster autoscaler, SIG-Multi-cluster).
- To provide a transition path for Kubernetes lifecycle products to adopt Cluster API incrementally. Specifically, existing cluster lifecycle management tools should be able to adopt Cluster API in a staged manner, over the course of multiple releases, or even adopting a subset of Cluster API.
If you use go 1.17 in go.mod and run go mod tidy, you will see a second require block with indirect dependencies. Few things changed in 1.17 for dependency graph. More details here: https://go.dev/doc/go1.17#go-command
Also, few useful commands to manage dependencies
go mod tidy # fix up modules
go mod why -m <module> # shows where this module is used from main module
go mod graph # shows module requirement graph
go list -m all # list all dependencies
go list -m -u -versions -json <module> # shows versions of a module
You can easily enable monitoring for your projects in OpenShift if your workload exposes a metrics endpoint that can be scraped by prometheus. For example, operator you develop using operator-sdk
adds an endpoint with lots of useful metrics by deault. You can query them in OpenShift web console if you enable monitoring for user-defined projects.
To enable, edit cluster-monitoring-config
ConfigMap in openshift-monitoring
namespace and add the following under data
data:
config.yaml: |
enableUserWorkload: true
If you are creating a standalone go library for your operator and have types that are used in your operator CRD spec/status, you will see that you need code-generation to generate DeepCopy and DeepCopyInto methods for your types. For API types in your operator, make generate will do that for you automatically using controller-tools. But for your library project that is not using controller-tools, you can do the same code generation using github.com/kubernetes/code-generator.
An example is here: https://github.com/jeesmon/operator-utils/blob/main/Makefile#L14-L20
Steps:
kubectl api-resources --no-headers | \
awk '{print $1}' | \
xargs -I '{}' kubectl get '{}' --all-namespaces -o json 2>/dev/null | \
jq '.items[] | \
select((.metadata.ownerReferences // [])[] | \
select(.kind == "YourCRD"))'
Credits: Oleg Matskiv