For simplicity, we are looking at three types of resources that need to be controlled:
- CPU cycles
- RAM capacity
- Disk capacity (PersistentVolumes)
There are four main concepts that work together to facilitate resource usage controls in Kubernetes:
Resourcequota
Limitrange
Limits
Requests
- (and default values for limits and requests)
This is my attempt at explaining how these things work. Note: I am looking at this from the perspective of future Openshift cluster admin, which means that I am (going to be) tasked partly with protecting the different projects / namespaces from each other, too.
ResourceQuota
states at namespace level how much CPU, RAM or Disk the pods/containers may use, combined. If nothing else, setting this is a must. In a multi-tenant cluster like Openshift, Project/namespace admins should NOT have the ability to rewrite a project's resourcequota. But the story of resource usage limiting does not end there!
Request
and Limit
values should be attached to every single pod, at the very least for RAM and CPU, in order to allow kubernetes keep the system running smoothly.
Request
: This is the amount of CPU/RAM that Kubernetes guarantees every container in the pod may get. If Kubernetes sees that the cluster does not have the resources to honor this guarantee, the pod will not be scheduled. IOW, this is the lower bound of resource for the containers.Limit
: This is the amount of CPU/RAM that Kubernetes may give to every container in the pod. This is very useful value in many ways: The pod may claim this amount as a self-restriction mechanism to protect the cluster from itself (bugs or surprise loads for server software), and the Kubernetes cluster has some usable information of how much resources the program running inside the container may actually need to use in order to be useful, while still being able to restrict huge spikes.- A usable strategy might be to have a absolutely minimal
Request
values for CPU / RAM, just enough for the pods to barely run, and use theLimit
values to cap the container-level resource usage. Maybe?
- A usable strategy might be to have a absolutely minimal
Lastly, let's look at LimitRange
object, that is used by cluster administrators to streamline the management of resource usage limitations:
LimitRange
is applied to a project / namespace and it also should not be rewritable by project/namespace admins in a proper multi-tenant cluster. WithLimitRange
, cluster administration can force minimum & maximum amounts for pod/container -level requests and limits, as well as some sensible defaults to let project/namespace admins / developers to just use the cluster.
It is up to cluster administration to make sure the ResourceQuota
and LimitRange
are sensibly coherent together, like not giving a default limit and default request values that are farther from each other than the allowed maxLimitRequestRatio
since that mistake will mean developers will have to state at least one of those values or Kubernetes will refuse to schedule the pods thanks to the applied default values being in violation of the max ratio value. Also, setting a higher maximum per-container Limit
than the ResourceQuota
in effect for the namespace is also somewhat non-sensical, if still doable, as long as the Request
falls below the ResourceQuota
. And so on.
If the cluster starts to experience CPU / Memory shortage, Kubernetes will need a way to prioritise the pods. Pods that do not have requests / limits attached to them are, from kube-scheduler's point of view, unreasonable. "Unreasonable" meaning that there is no way for the scheduler to make any sensible decisions about what to put and where. Therefore those pods will always be the first ones to get evicted from cluster.
With CPU, kube-scheduler may throttle the pods and the processes will simply run slower. With RAM, this is not possible due to obvious reasons, and evictions will happen.
The full story and details about how kubernetes selects evictions in case of resource shortage will have to wait for a later time.
Then there are limits how many configmaps, secrets, pods, and other kubernetes objects a namespace can hold. There is no kubernetes-side hard limit on how big a configmap or secret can be, but the etcd backend datastore can only hold 1MB objects, and other parts of the plumbing in apiserver might also have other limits. Word on the streets seems to think that around 1MB is the current upper limit for a single configmap/secret/other objects. Some more information here: https://stackoverflow.com/a/53015758/1889463
We are postponing / skipping this side of resource limiting for now.