The goal of this document to cover all aspects of Kubernetes management, including how resources are expressed, constrained and accounted for. This started a way to ensure that alternate container runtime implementation like Kata containers will behave from a resource accounting and consumption point of view in the same manner as runc
.
Location of the latest version of this document: https://gist.github.com/mcastelino/b8ce9a70b00ee56036dadd70ded53e9f
If you do not understand cgroups please refer to a quick primer at the bottom of this document. This will help you understand how the resource enforcement actually works.
There are two things to consider.
- What is enforced
- What is scheduled/allocatable
Note: Scheduling is based on requests and not limits. Limits are enforced (using cgroups). Requests are used at scheduling (and sometimes enforced by cgroups). This will be important in all the discussion to follow.
Note: Also not all PODs have limits and requests. Not all requests will be guaranteed (see memory requests).
Let us say we start we a node with 8 CPUs and 8GB of memory and 40GB of node local storage.
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7
Memory block size: 128M
Total online memory: 8G
Total offline memory: 0B
We want to set aside some resources for the node itself to function. This allows the node to be stable and system and user required services which are managed and scheduled outside of kubernetes to be protected/isolated from pods launched by kubernetes.
Kubernetes classifies these into two categories, kube
and system
.
When we launch kubernetes we set aside resources for kube
and system
.
Here
kube
: kubernetes associated components. (kubelet, containerd/cri-o, ?shims?...) which are not explicitly placed in pods.system
: all other system daemons and the user processes
This can be done at configuration time using kubeadm configuration.
$ sudo -E kubeadm init --config=./kubeadm.yaml
where kubeadm.yaml:
apiVersion: kubeadm.k8s.io/v1beta1
kind: InitConfiguration
---
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
# Allowing for CPU pinning and isolation in case of guaranteed QoS class
cpuManagerPolicy: static
systemReserved:
cpu: 500m
memory: 256M
kubeReserved:
cpu: 500m
memory: 256M
---
apiVersion: kubeadm.k8s.io/v1beta1
kind: ClusterConfiguration
networking:
dnsDomain: cluster.local
podSubnet: 10.244.0.0/16
serviceSubnet: 10.96.0.0/12
Here we set aside 500MB or memory across kube and system. Also we have set aside 1 CPU.
mrcastel@bored-pelinor:~$ kubectl describe node
Name: bored-pelinor
Roles: master
...
Capacity:
cpu: 8
ephemeral-storage: 40470732Ki
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 8167148Ki
pods: 110
Allocatable:
cpu: 7
ephemeral-storage: 37297826550
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 7564748Ki
pods: 110
...
Here you will notice a few things
cpu: 8
ephemeral-storage: 40470732Ki
memory: 8167148Ki
cpu: 7
ephemeral-storage: 37297826550
memory: 7564748Ki
- The Allocatable has already excluded 1 CPU and 500MB of memory.
- The Kubernetes scheduler will not schedule a pod on a node if
total pod requests >= allocatable
. - Limits are not considered when scheduling. They are only used to limit the resource consumption of a pod.
- Kubernetes scheduler will allow a pod to land on a node if the
requests
fit withinallocatable
. - requests sometimes decide the minimum amount of resources the container is guaranteed
- limits are always enforced so that a container will never exceed its limit
- limits and requests are both optional. Presence and absence of them ends up defining the QoS class to which the POD will fall in.
- Even though resources are expressed at a container level, the QoS is setup at a POD level
mrcastel@bored-pelinor:~$ kubectl describe node
...
Non-terminated Pods: (9 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE
--------- ---- ------------ ---------- --------------- ------------- ---
default burst-2 300m (4%) 600m (8%) 400Mi (5%) 600Mi (8%) 155m
kube-system coredns-fb8b8dccf-669b2 100m (1%) 0 (0%) 70Mi (0%) 170Mi (2%) 4h5m
kube-system coredns-fb8b8dccf-pjskz 100m (1%) 0 (0%) 70Mi (0%) 170Mi (2%) 4h5m
kube-system etcd-bored-pelinor 0 (0%) 0 (0%) 0 (0%) 0 (0%) 4h4m
kube-system kube-apiserver-bored-pelinor 250m (3%) 0 (0%) 0 (0%) 0 (0%) 4h4m
kube-system kube-controller-manager-bored-pelinor 200m (2%) 0 (0%) 0 (0%) 0 (0%) 4h4m
kube-system kube-flannel-ds-amd64-q9cnc 100m (1%) 100m (1%) 50Mi (0%) 50Mi (0%) 3h48m
kube-system kube-proxy-csxkh 0 (0%) 0 (0%) 0 (0%) 0 (0%) 4h5m
kube-system kube-scheduler-bored-pelinor 100m (1%) 0 (0%) 0 (0%) 0 (0%) 4h4m
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 1150m (16%) 700m (10%)
memory 590Mi (7%) 990Mi (13%)
ephemeral-storage 0 (0%) 0 (0%)
- Here we see the current status of the node w.r.t scheduling.
- As the status indicates "Total limits may be over 100 percent, i.e., overcommitted.". This has significance later.
Containers in a POD express resource via requests and limits.
apiVersion: v1
kind: Pod
metadata:
name: burst
spec:
containers:
- name: busybee
image: busybox
resources:
limits:
cpu: 500m
memory: "400Mi"
requests:
cpu: 250m
memory: "300Mi"
command: ["md5sum"]
args: ["/dev/urandom"]
- The kubernetes documentation suggests that system and kube should be placed under their own cgroup hierarchies https://github.com/kubernetes/community/blob/master/contributors/design-proposals/node/node-allocatable.md#recommended-cgroups-setup.
- However this is hard to do on a system where systemd is already in play and creating the user and system slices.
- We found that nobody today can actually enforce as they do not create the requisite cgroups and openshift says do not do it!!! - https://docs.openshift.com/container-platform/3.3/admin_guide/allocating_node_resources.html
- cpusets: We have no idea how the cpuset will be determined in the case of static pods in this case
- For cpusets to be handled correctly kubepods should form a pool that is does not include cpus assigned to
kube
andsystem
. - We can assume for now that
kube
andsystem
will not request cpusets.
- For cpusets to be handled correctly kubepods should form a pool that is does not include cpus assigned to
- The cluster admin is supposed to create separate cgroups if they truly want system daemons and kube components (kubelet, crio, containerd, ....) to not impact PODs and vice versa
From openshift documentation
Optionally, the node can be made to enforce kube-reserved and system-reserved by specifying those tokens in the enforce-node-allocatable flag.
If specified, the corresponding --kube-reserved-cgroup or --system-reserved-cgroup needs to be provided. In future releases, the node and container
runtime will be packaged in a common cgroup separate from system.slice. Until that time, we do not recommend users change the default value of enforce-node-allocatable flag.
Administrators should treat system daemons similar to Guaranteed pods. System daemons can burst within their bounding control groups and this behavior
needs to be managed as part of cluster deployments. Enforcing system-reserved limits can lead to critical system services being CPU starved or OOM killed
on the node. The recommendation is to enforce system-reserved only if operators have profiled their nodes exhaustively to determine precise estimates and
are confident in their ability to recover if any process in that group is OOM killed.
As a result, we strongly recommended that users only enforce node allocatable for pods by default, and set aside appropriate reservations for system
daemons to maintain overall node reliability.
- If not done there is a potential
kube
andsystem
to impact guaranteed cpusets
From the Linux Kernel: Documentation/cgroups/cpu.txt
- cpu.shares: The weight of each group living in the same hierarchy, that
translates into the amount of CPU it is expected to get. Upon cgroup creation,
each group gets assigned a default of 1024. The percentage of CPU assigned to
the cgroup is the value of shares divided by the sum of all shares in all
cgroups in the same level.
- cpu.cfs_period_us: The duration in microseconds of each scheduler period, for
bandwidth decisions. This defaults to 100000us or 100ms. Larger periods will
improve throughput at the expense of latency, since the scheduler will be able
to sustain a cpu-bound workload for longer. The opposite of true for smaller
periods. Note that this only affects non-RT tasks that are scheduled by the
CFS scheduler.
- cpu.cfs_quota_us: The maximum time in microseconds during each cfs_period_us
in for the current group will be allowed to run. For instance, if it is set to
half of cpu_period_us, the cgroup will only be able to peak run for 50 % of
the time. One should note that this represents aggregate time over all CPUs
in the system. Therefore, in order to allow full usage of two CPUs, for
instance, one should set this value to twice the value of cfs_period_us.
This controls the minimum
amount of CPU quota
Hence kubepods gets 7168 / (7168 + 1024 + 1024) i.e. ~ 77% of the CPU on the system. Open: This is not quite the 7/8 we expected.
All pods are under kubepods, hence fit within the 77%.
This controls the upper bound of CPU usage
At the lower levels this is set to -1
which is basically unbounded. Hence all bounds are enforced at pod level or in some cases at container level.
- Our assumption is that for kubepods to not be impacted by kubeReserved and systemReserved the kubepod hierarchy should be admin created and ensure that there is no overlap of cpusets. If not guaranteed pods can be impacted by system daemons and kube components.
# cat /sys/fs/cgroup/cpu/cpu.shares
1024
# cat /sys/fs/cgroup/cpu/kubepods/cpu.shares
7168
# cat /sys/fs/cgroup/cpu/user.slice/cpu.shares
1024
# cat /sys/fs/cgroup/cpu/system.slice/cpu.shares
1024
- This means that under heavy load the kubepods are assured of getting CPU time. Hence pods will not starve.
- This also means that kubepods get ~7/8 of the CPU scheduling time when the node is over committed.
root@bored-pelinor:/sys/fs/cgroup/cpu# cat /sys/fs/cgroup/cpu/cpu.cfs_quota_us
-1
# cat /sys/fs/cgroup/cpu/kubepods/cpu.cfs_quota_us
-1
# cat /sys/fs/cgroup/cpu/system.slice/cpu.cfs_quota_us
-1
# cat /sys/fs/cgroup/cpu/user.slice/cpu.cfs_quota_us
-1
- This means that kubepods, system and user slices have been setup with no upper bounds.
- Hence can consume as much of the CPU as needed when system is not over-committed (unless individual processes are confined using child cgroups)
#cat /sys/fs/cgroup/memory/memory.limit_in_bytes
9223372036854771712
#cat /sys/fs/cgroup/memory/user.slice/memory.limit_in_bytes
9223372036854771712
#cat /sys/fs/cgroup/memory/system.slice/memory.limit_in_bytes
9223372036854771712
#cat /sys/fs/cgroup/memory/kubepods/memory.limit_in_bytes
7851159552
- The kubepods are limited to 7487 MB. Which excludes the 500MB we set aside for kube and system.
- This means that even though memory
requests
are not enforced at cgroup level as you will soon see. This top level limit enforcement will ensure that pods do not exceed what is allocated at a system level for pods. - This also means that
pods
are free to grow over their requests, until they hit thier limits as long as the total memory consumption of all pods exceeds the total memory allocated to kubepods. - This ensures that pods will never exceed the maximum allocation
- This is why kubernetes memory management works fine even though soft limits are not setup
- However it does mean that the control plane pods may get killed indiscriminately if workload pods live beyong thier requests
- Open: How can this be handled properly?
Note: Kubernetes does not use memory soft limits. Hence the requests are used strictly for kubernetes to decide scheduling. If the user creates a pod whose actual usage exceeds the requests (i.e. closer to the limit, rather than the request) it will result in the actual memory usage on the node exceed what was scheduled. This will result in the kernel OOM killing some process.
Open: All things being equal which process will the kernel OOM kill. The kernel document says it will kill the bulkiest task???
From the Kernel documentation:
2.5 Reclaim
Each cgroup maintains a per cgroup LRU which has the same structure as
global VM. When a cgroup goes over its limit, we first try
to reclaim memory from the cgroup so as to make space for the new
pages that the cgroup has touched. If the reclaim is unsuccessful,
an OOM routine is invoked to select and kill the bulkiest task in the
cgroup. (See 10. OOM Control below.)
2.3 Shared Page Accounting
Shared pages are accounted on the basis of the first touch approach. The
cgroup that first touches a page is accounted for the page. The principle
behind this approach is that a cgroup that aggressively uses a shared
page will eventually get charged for it (once it is uncharged from
the cgroup that brought it in -- this will happen on memory pressure).
#cat /sys/fs/cgroup/cpuset/cpuset.cpus
0-7
#cat /sys/fs/cgroup/cpuset/kubepods/cpuset.cpus
0-7
- Even though we have requested a static cpu policy the kubepods today will include the full set of CPUs.
- Note: This is true even when specifying integer CPU requests for kube and system.
- Open: If we created separate cpu hierarchies for kube and system, we need to figure out how to setup the cpuset properly for kubepods.
PODs can be of three QoS classes with variants
- Best Effort
- Burstable
- No limit
- With limit
- Guaranteed
- static policy
- non static
- may have a cpuset
- will have cpu shares set
- will have quota set (which is == cpu shares upconverted)
Hence they will have an upper bound on performance (and upper bound == lower bound). Upper bound is also guaranteed.
Even though the limits may seem to be lesser than the resources assigned to other types of pods, in reality as they do not contend with any other processes on the CPU's which are dedicated to them, they perform better than the pods in the common pool.
- The only source of interference to these pods are
kube
andsystem
components. - If this interference is large it can potentially impact the performance of these pods.
- may have cpuset (which will be the shared kubepod pool which excludes any cpus given to any guaranteed pods)
- will have cpu shares set (which determines guaranteed lower bound)
- will have quota set (upper bound > lower bound; and upper bound is not guaranteed due to CFS implementation)
- may have cpuset (which will be the shared kubepod pool which excludes any cpus given to any guaranteed pods)
- will have cpu shares set (which determines guaranteed lower bound)
- quota set to -1 (no upper bound)
These are a good choice for any workload as they give you a degree of assured CPU performance, while still being unlimited when capacity is available.
- may have cpuset (which will be the shared kubepod pool which excludes any cpus given to any guaranteed pods)
- will have cpu shares set to
2
. So a minimal lower bound of performance. - quota set to -1 (no upper bound)
apiVersion: v1
kind: Pod
metadata:
name: guar-2s
spec:
containers:
- name: busybee
image: busybox
resources:
limits:
cpu: 2
memory: "400Mi"
command: ["md5sum"]
args: ["/dev/urandom"]
- name: busybum
image: busybox
command: [ "top" ]
resources:
limits:
cpu: 1
memory: "200Mi"
command: ["md5sum"]
args: ["/dev/urandom"]
$kubectl get pod -o=custom-columns=NAME:.metadata.name,UID:.metadata.uid
NAME UID
guar-2s 99b66879-565f-11e9-9de2-525400123456
#cat /sys/fs/cgroup/cpuset/kubepods/cpuset.cpus
0-7
#cat /sys/fs/cgroup/cpuset/kubepods/pod99b66879-565f-11e9-9de2-525400123456/cpuset.cpus
0-7
#cat /sys/fs/cgroup/cpuset/kubepods/pod99b66879-565f-11e9-9de2-525400123456/2882ebb5acb99f5b09ee41954720101a954847151951ff56e0e3a919a2044a5a/cpuset.cpus
0-7
#cat /sys/fs/cgroup/cpuset/kubepods/pod99b66879-565f-11e9-9de2-525400123456/2af8973b96abe2e023eb343cb87989a33c08b61f1bf2eead306da8f15eb74026/cpuset.cpus
3
#cat /sys/fs/cgroup/cpuset/kubepods/pod99b66879-565f-11e9-9de2-525400123456/a547d6c57f8b08a493156febb9b7071320ce1d87631e2e7bea142b7ac9351f23/cpuset.cpus
1-2
#cat /sys/fs/cgroup/cpu/kubepods/pod99b66879-565f-11e9-9de2-525400123456/cpu.shares
3072
#cat /sys/fs/cgroup/cpu/kubepods/pod99b66879-565f-11e9-9de2-525400123456/2882ebb5acb99f5b09ee41954720101a954847151951ff56e0e3a919a2044a5a/cpu.shares
2
#cat /sys/fs/cgroup/cpu/kubepods/pod99b66879-565f-11e9-9de2-525400123456/2af8973b96abe2e023eb343cb87989a33c08b61f1bf2eead306da8f15eb74026/cpu.shares
1024
#cat /sys/fs/cgroup/cpu/kubepods/pod99b66879-565f-11e9-9de2-525400123456/a547d6c57f8b08a493156febb9b7071320ce1d87631e2e7bea142b7ac9351f23/cpu.shares
2048
#cat /sys/fs/cgroup/cpu/kubepods/cpu.cfs_quota_us
-1
#cat /sys/fs/cgroup/cpu/kubepods/pod99b66879-565f-11e9-9de2-525400123456/cpu.cfs_quota_us
300000
#cat /sys/fs/cgroup/cpu/kubepods/pod99b66879-565f-11e9-9de2-525400123456/2882ebb5acb99f5b09ee41954720101a954847151951ff56e0e3a919a2044a5a/cpu.cfs_quota_us
-1
#cat /sys/fs/cgroup/cpu/kubepods/pod99b66879-565f-11e9-9de2-525400123456/2af8973b96abe2e023eb343cb87989a33c08b61f1bf2eead306da8f15eb74026/cpu.cfs_quota_us
100000
#cat /sys/fs/cgroup/cpu/kubepods/pod99b66879-565f-11e9-9de2-525400123456/a547d6c57f8b08a493156febb9b7071320ce1d87631e2e7bea142b7ac9351f23/cpu.cfs_quota_us
200000
Here you will see the following
- pod itself is not pinned (this is very important to note)
- pause container is not pinned
- pause container is assured only 2 cpu shares but unbounded
- container1 is pinned to 3 and assured 1024 cpu shares (1024/(2+1024+2048)). Yes it does not add up to 3072 but close. It is also bounded to 1 CPU upper bound in scheduling time.
- container2 is pinned to 1 and 2, assured 2048 cpu shared and upper bound to 2 CPUs worth of scheduling time.
- Hence the pause (i.e. The sandbox container is upper bound by the total cfs_quota). However the sandbox/pause itself is not pinned.
- Given that kubepod is effectively unbounded the quota is enforced at pod level
300000
to be nett of 3 CPUs worth of time.
Let us create pods of each type
apiVersion: v1
kind: Pod
metadata:
name: beff-2
spec:
containers:
- name: busybee
image: busybox
command: ["md5sum"]
args: ["/dev/urandom"]
- name: busybum
image: busybox
command: ["md5sum"]
args: ["/dev/urandom"]
apiVersion: v1
kind: Pod
metadata:
name: burst-2
spec:
containers:
- name: busybee
image: busybox
resources:
limits:
cpu: 500m
memory: "400Mi"
requests:
cpu: 250m
memory: "300Mi"
command: ["md5sum"]
args: ["/dev/urandom"]
- name: busybum
image: busybox
resources:
limits:
cpu: 100m
memory: "200Mi"
requests:
cpu: 50m
memory: "100Mi"
command: ["md5sum"]
args: ["/dev/urandom"]
apiVersion: v1
kind: Pod
metadata:
name: guar-2
spec:
containers:
- name: busybee
image: busybox
resources:
limits:
cpu: 400m
memory: "400Mi"
command: ["md5sum"]
args: ["/dev/urandom"]
- name: busybum
image: busybox
resources:
limits:
cpu: 200m
memory: "200Mi"
command: ["md5sum"]
args: ["/dev/urandom"]
apiVersion: v1
kind: Pod
metadata:
name: guar-2s
spec:
containers:
- name: busybee
image: busybox
resources:
limits:
cpu: 2
memory: "400Mi"
command: ["md5sum"]
args: ["/dev/urandom"]
- name: busybum
image: busybox
resources:
limits:
cpu: 1
memory: "200Mi"
command: ["md5sum"]
args: ["/dev/urandom"]
mrcastel@bored-pelinor:~$ kubectl get po
NAME READY STATUS RESTARTS AGE
beff-2 2/2 Running 0 4m50s
burst-2 2/2 Running 0 12m
guar-2 2/2 Running 0 4m38s
guar-2s 2/2 Running 0 53m
mrcastel@bored-pelinor:~$ kubectl get pod --all-namespaces -o=custom-columns=NAME:.metadata.name,UID:.metadata.uid
NAME UID
beff-2 55148103-5666-11e9-9de2-525400123456
burst-2 41a37c57-5665-11e9-9de2-525400123456
guar-2 5c061c35-5666-11e9-9de2-525400123456
guar-2s 99b66879-565f-11e9-9de2-525400123456
coredns-fb8b8dccf-ng6st f3f5916c-565e-11e9-9de2-525400123456
coredns-fb8b8dccf-tctwp f3f524b6-565e-11e9-9de2-525400123456
etcd-bored-pelinor 16588327-565f-11e9-9de2-525400123456
kube-apiserver-bored-pelinor 16f110e8-565f-11e9-9de2-525400123456
kube-controller-manager-bored-pelinor 11943a2d-565f-11e9-9de2-525400123456
kube-flannel-ds-amd64-gtwrh 45e519f1-565f-11e9-9de2-525400123456
kube-proxy-vk6j9 f3f04c5b-565e-11e9-9de2-525400123456
kube-scheduler-bored-pelinor 1b1d33da-565f-11e9-9de2-525400123456
We see that
- Only the
Guaranteed
containers promise not to trample on each other, as evidenced by the containers in those pods being pinned- The other Guaranteed pod is set to
0,4-7
which excludes3
and1-2
- The other Guaranteed pod is set to
- Best effort and burstable containers are excluded from the pinned sets
- However kube components can still land on the pinned cpus (but the scheduling quota should mitigate that effect)
#for i in `ls /sys/fs/cgroup/cpuset/kubepods/**/cpuset.cpus`; do echo $i && cat $i; done
/sys/fs/cgroup/cpuset/kubepods/besteffort/cpuset.cpus
0-7
/sys/fs/cgroup/cpuset/kubepods/besteffort/pod55148103-5666-11e9-9de2-525400123456/02f5cf17015d31beb2462857e1773754221712389d17df5a3a1e636bc04daaac/cpuset.cpus
0,4-7
/sys/fs/cgroup/cpuset/kubepods/besteffort/pod55148103-5666-11e9-9de2-525400123456/1dad0913e3a373e1742ef9dbc707fbcaab65fa1d394d5dc5c7df6c6d36569db7/cpuset.cpus
0,4-7
/sys/fs/cgroup/cpuset/kubepods/besteffort/pod55148103-5666-11e9-9de2-525400123456/6d50e7ca6b84fbd4195155e812e04e7b5976666818fe69e6d28832e63fae639f/cpuset.cpus
0-7
/sys/fs/cgroup/cpuset/kubepods/besteffort/pod55148103-5666-11e9-9de2-525400123456/cpuset.cpus
0-7
/sys/fs/cgroup/cpuset/kubepods/besteffort/pod69b754681cf0cf1bf12010694a10f2cb/6770f6eccae402706a68b71d9a593cc9f64aa2961419f2bfad4f57b265ded453/cpuset.cpus
0-7
/sys/fs/cgroup/cpuset/kubepods/besteffort/pod69b754681cf0cf1bf12010694a10f2cb/bab92c486dd6626714666e28e41672dc2250302699d687d7850b60550b8f03ad/cpuset.cpus
0,4-7
/sys/fs/cgroup/cpuset/kubepods/besteffort/pod69b754681cf0cf1bf12010694a10f2cb/cpuset.cpus
0-7
/sys/fs/cgroup/cpuset/kubepods/besteffort/podf3f04c5b-565e-11e9-9de2-525400123456/b4c11a95aa1b5ab85649e23a2905cb1b893c4aa3b0201e285e4fcef647bfe584/cpuset.cpus
0-7
/sys/fs/cgroup/cpuset/kubepods/besteffort/podf3f04c5b-565e-11e9-9de2-525400123456/cpuset.cpus
0-7
/sys/fs/cgroup/cpuset/kubepods/besteffort/podf3f04c5b-565e-11e9-9de2-525400123456/e9fef3c19e2966fb2870e99fcc682ac5a350fdf847ee415f6f3aaf88bbfbc17a/cpuset.cpus
0-7
/sys/fs/cgroup/cpuset/kubepods/besteffort/podf3f04c5b-565e-11e9-9de2-525400123456/e9fef3c19e2966fb2870e99fcc682ac5a350fdf847ee415f6f3aaf88bbfbc17a/kube-proxy/cpuset.cpus
0-7
/sys/fs/cgroup/cpuset/kubepods/burstable/cpuset.cpus
0-7
/sys/fs/cgroup/cpuset/kubepods/burstable/pod41a37c57-5665-11e9-9de2-525400123456/31e7f4b8bfe5a23919d507c8d683ea97a83698f3e5f4abc13e4f5b401a26f3f1/cpuset.cpus
0,4-7
/sys/fs/cgroup/cpuset/kubepods/burstable/pod41a37c57-5665-11e9-9de2-525400123456/62c5bf39f9bbd6dacb93b0a34e78f6c8db2ff0962bb90ad9ee564fb50b9c5554/cpuset.cpus
0,4-7
/sys/fs/cgroup/cpuset/kubepods/burstable/pod41a37c57-5665-11e9-9de2-525400123456/cpuset.cpus
0-7
/sys/fs/cgroup/cpuset/kubepods/burstable/pod41a37c57-5665-11e9-9de2-525400123456/f876aa7433c71aec840afc385a39e1f3c1541c4521e148c3771e8679f040b788/cpuset.cpus
0-7
/sys/fs/cgroup/cpuset/kubepods/burstable/pod439651677ca7971bec7b2a9a0df5a512/1c3e8063c7bba7cf0e4b6e777704e56e5bdcbb15a794d36423a4734e6d5cb751/cpuset.cpus
0,4-7
/sys/fs/cgroup/cpuset/kubepods/burstable/pod439651677ca7971bec7b2a9a0df5a512/a0ff4137e61acf00ed6dc94212dbfba5161a7cb5fe20217e2cfaa1ba91474ae8/cpuset.cpus
0-7
/sys/fs/cgroup/cpuset/kubepods/burstable/pod439651677ca7971bec7b2a9a0df5a512/cpuset.cpus
0-7
/sys/fs/cgroup/cpuset/kubepods/burstable/pod54146492ed90bfa147f56609eee8005a/55770fabdef138519c8e013a2150630a60bf7dc0e73d5899ca99e72124f23434/cpuset.cpus
0,4-7
/sys/fs/cgroup/cpuset/kubepods/burstable/pod54146492ed90bfa147f56609eee8005a/cd1698af7a1024920e12633350d98d3f45f83d775eb9ada5c76319cfe0fb9573/cpuset.cpus
0-7
/sys/fs/cgroup/cpuset/kubepods/burstable/pod54146492ed90bfa147f56609eee8005a/cpuset.cpus
0-7
/sys/fs/cgroup/cpuset/kubepods/burstable/pod58272442e226c838b193bbba4c44091e/22b075e3e9b9e8ad3bc7b5f2fb359de6ccbc751988e9b9cdb82c77a284de9847/cpuset.cpus
0,4-7
/sys/fs/cgroup/cpuset/kubepods/burstable/pod58272442e226c838b193bbba4c44091e/cpuset.cpus
0-7
/sys/fs/cgroup/cpuset/kubepods/burstable/pod58272442e226c838b193bbba4c44091e/ea20f04b19b1ce96cdd48f88a6a8a588a4352cd8a2fd7c6490d56b139f6d7f39/cpuset.cpus
0-7
/sys/fs/cgroup/cpuset/kubepods/burstable/podf3f524b6-565e-11e9-9de2-525400123456/4db061942925e99a1c687c5195fca2114e39347b22bc7cfcba51438de4efa31f/cpuset.cpus
0,4-7
/sys/fs/cgroup/cpuset/kubepods/burstable/podf3f524b6-565e-11e9-9de2-525400123456/cpuset.cpus
0-7
/sys/fs/cgroup/cpuset/kubepods/burstable/podf3f524b6-565e-11e9-9de2-525400123456/fde98ded39d10e37a883c8e08fadc8e541b1793f44548f0b3e028b6c1ddd9034/cpuset.cpus
0-7
/sys/fs/cgroup/cpuset/kubepods/burstable/podf3f5916c-565e-11e9-9de2-525400123456/c419fde416d40bf1d89c8dbe374b229ef3a48dbce5cf808c7caf5282254a4ece/cpuset.cpus
0,4-7
/sys/fs/cgroup/cpuset/kubepods/burstable/podf3f5916c-565e-11e9-9de2-525400123456/cddf60599b04a5ff107254047bceea0608cf59a82de9626f6f4b9c2778705be6/cpuset.cpus
0-7
/sys/fs/cgroup/cpuset/kubepods/burstable/podf3f5916c-565e-11e9-9de2-525400123456/cpuset.cpus
0-7
/sys/fs/cgroup/cpuset/kubepods/cpuset.cpus
0-7
/sys/fs/cgroup/cpuset/kubepods/pod45e519f1-565f-11e9-9de2-525400123456/75f8d2d2358fe8b7516f29472bb11378b108d0567dc820d0fc757469c2c9ca0f/cpuset.cpus
0,4-7
/sys/fs/cgroup/cpuset/kubepods/pod45e519f1-565f-11e9-9de2-525400123456/cpuset.cpus
0-7
/sys/fs/cgroup/cpuset/kubepods/pod45e519f1-565f-11e9-9de2-525400123456/e1a54b680bdef8d6ee1acf1fde4a2a35db950cb5ad106d11f694fc10841bdcf0/cpuset.cpus
0-7
/sys/fs/cgroup/cpuset/kubepods/pod5c061c35-5666-11e9-9de2-525400123456/12d48c76f025ea5b0a2f3cbab6765b75c5666ef74528da4ef7d8c4260a075faa/cpuset.cpus
0,4-7
/sys/fs/cgroup/cpuset/kubepods/pod5c061c35-5666-11e9-9de2-525400123456/9a9a45ea9730ef8a313070103f2701a4d1942c90eb985938d77b968f5cac460e/cpuset.cpus
0,4-7
/sys/fs/cgroup/cpuset/kubepods/pod5c061c35-5666-11e9-9de2-525400123456/cpuset.cpus
0-7
/sys/fs/cgroup/cpuset/kubepods/pod5c061c35-5666-11e9-9de2-525400123456/de7559571058b943737889052c3ebd48cba281b7ab3b0b2327825d8256e570da/cpuset.cpus
0-7
/sys/fs/cgroup/cpuset/kubepods/pod99b66879-565f-11e9-9de2-525400123456/2882ebb5acb99f5b09ee41954720101a954847151951ff56e0e3a919a2044a5a/cpuset.cpus
0-7
/sys/fs/cgroup/cpuset/kubepods/pod99b66879-565f-11e9-9de2-525400123456/2af8973b96abe2e023eb343cb87989a33c08b61f1bf2eead306da8f15eb74026/cpuset.cpus
3
/sys/fs/cgroup/cpuset/kubepods/pod99b66879-565f-11e9-9de2-525400123456/a547d6c57f8b08a493156febb9b7071320ce1d87631e2e7bea142b7ac9351f23/cpuset.cpus
1-2
/sys/fs/cgroup/cpuset/kubepods/pod99b66879-565f-11e9-9de2-525400123456/cpuset.cpus
0-7
Here we see that
- Best effort pods all get assured only 2 CPU shares, and this is set at POD level
- Burstable and Guaranteed PODs get assured their minimum cpu share, and this is set at POD level.
- And containers within them further split this share.
#shopt -s globstar
#for i in `ls /sys/fs/cgroup/cpu/kubepods/**/cpu.shares`; do echo $i && cat $i; done
/sys/fs/cgroup/cpu/kubepods/besteffort/cpu.shares
2
/sys/fs/cgroup/cpu/kubepods/besteffort/pod55148103-5666-11e9-9de2-525400123456/02f5cf17015d31beb2462857e1773754221712389d17df5a3a1e636bc04daaac/cpu.shares
2
/sys/fs/cgroup/cpu/kubepods/besteffort/pod55148103-5666-11e9-9de2-525400123456/1dad0913e3a373e1742ef9dbc707fbcaab65fa1d394d5dc5c7df6c6d36569db7/cpu.shares
2
/sys/fs/cgroup/cpu/kubepods/besteffort/pod55148103-5666-11e9-9de2-525400123456/6d50e7ca6b84fbd4195155e812e04e7b5976666818fe69e6d28832e63fae639f/cpu.shares
2
/sys/fs/cgroup/cpu/kubepods/besteffort/pod55148103-5666-11e9-9de2-525400123456/cpu.shares
2
/sys/fs/cgroup/cpu/kubepods/besteffort/pod69b754681cf0cf1bf12010694a10f2cb/6770f6eccae402706a68b71d9a593cc9f64aa2961419f2bfad4f57b265ded453/cpu.shares
2
/sys/fs/cgroup/cpu/kubepods/besteffort/pod69b754681cf0cf1bf12010694a10f2cb/bab92c486dd6626714666e28e41672dc2250302699d687d7850b60550b8f03ad/cpu.shares
2
/sys/fs/cgroup/cpu/kubepods/besteffort/pod69b754681cf0cf1bf12010694a10f2cb/cpu.shares
2
/sys/fs/cgroup/cpu/kubepods/besteffort/podf3f04c5b-565e-11e9-9de2-525400123456/b4c11a95aa1b5ab85649e23a2905cb1b893c4aa3b0201e285e4fcef647bfe584/cpu.shares
2
/sys/fs/cgroup/cpu/kubepods/besteffort/podf3f04c5b-565e-11e9-9de2-525400123456/cpu.shares
2
/sys/fs/cgroup/cpu/kubepods/besteffort/podf3f04c5b-565e-11e9-9de2-525400123456/e9fef3c19e2966fb2870e99fcc682ac5a350fdf847ee415f6f3aaf88bbfbc17a/cpu.shares
2
/sys/fs/cgroup/cpu/kubepods/besteffort/podf3f04c5b-565e-11e9-9de2-525400123456/e9fef3c19e2966fb2870e99fcc682ac5a350fdf847ee415f6f3aaf88bbfbc17a/kube-proxy/cpu.shares
1024
/sys/fs/cgroup/cpu/kubepods/burstable/cpu.shares
1075
/sys/fs/cgroup/cpu/kubepods/burstable/pod41a37c57-5665-11e9-9de2-525400123456/31e7f4b8bfe5a23919d507c8d683ea97a83698f3e5f4abc13e4f5b401a26f3f1/cpu.shares
256
/sys/fs/cgroup/cpu/kubepods/burstable/pod41a37c57-5665-11e9-9de2-525400123456/62c5bf39f9bbd6dacb93b0a34e78f6c8db2ff0962bb90ad9ee564fb50b9c5554/cpu.shares
51
/sys/fs/cgroup/cpu/kubepods/burstable/pod41a37c57-5665-11e9-9de2-525400123456/cpu.shares
307
/sys/fs/cgroup/cpu/kubepods/burstable/pod41a37c57-5665-11e9-9de2-525400123456/f876aa7433c71aec840afc385a39e1f3c1541c4521e148c3771e8679f040b788/cpu.shares
2
/sys/fs/cgroup/cpu/kubepods/burstable/pod439651677ca7971bec7b2a9a0df5a512/1c3e8063c7bba7cf0e4b6e777704e56e5bdcbb15a794d36423a4734e6d5cb751/cpu.shares
256
/sys/fs/cgroup/cpu/kubepods/burstable/pod439651677ca7971bec7b2a9a0df5a512/a0ff4137e61acf00ed6dc94212dbfba5161a7cb5fe20217e2cfaa1ba91474ae8/cpu.shares
2
/sys/fs/cgroup/cpu/kubepods/burstable/pod439651677ca7971bec7b2a9a0df5a512/cpu.shares
256
/sys/fs/cgroup/cpu/kubepods/burstable/pod54146492ed90bfa147f56609eee8005a/55770fabdef138519c8e013a2150630a60bf7dc0e73d5899ca99e72124f23434/cpu.shares
204
/sys/fs/cgroup/cpu/kubepods/burstable/pod54146492ed90bfa147f56609eee8005a/cd1698af7a1024920e12633350d98d3f45f83d775eb9ada5c76319cfe0fb9573/cpu.shares
2
/sys/fs/cgroup/cpu/kubepods/burstable/pod54146492ed90bfa147f56609eee8005a/cpu.shares
204
/sys/fs/cgroup/cpu/kubepods/burstable/pod58272442e226c838b193bbba4c44091e/22b075e3e9b9e8ad3bc7b5f2fb359de6ccbc751988e9b9cdb82c77a284de9847/cpu.shares
102
/sys/fs/cgroup/cpu/kubepods/burstable/pod58272442e226c838b193bbba4c44091e/cpu.shares
102
/sys/fs/cgroup/cpu/kubepods/burstable/pod58272442e226c838b193bbba4c44091e/ea20f04b19b1ce96cdd48f88a6a8a588a4352cd8a2fd7c6490d56b139f6d7f39/cpu.shares
2
/sys/fs/cgroup/cpu/kubepods/burstable/podf3f524b6-565e-11e9-9de2-525400123456/4db061942925e99a1c687c5195fca2114e39347b22bc7cfcba51438de4efa31f/cpu.shares
102
/sys/fs/cgroup/cpu/kubepods/burstable/podf3f524b6-565e-11e9-9de2-525400123456/cpu.shares
102
/sys/fs/cgroup/cpu/kubepods/burstable/podf3f524b6-565e-11e9-9de2-525400123456/fde98ded39d10e37a883c8e08fadc8e541b1793f44548f0b3e028b6c1ddd9034/cpu.shares
2
/sys/fs/cgroup/cpu/kubepods/burstable/podf3f5916c-565e-11e9-9de2-525400123456/c419fde416d40bf1d89c8dbe374b229ef3a48dbce5cf808c7caf5282254a4ece/cpu.shares
102
/sys/fs/cgroup/cpu/kubepods/burstable/podf3f5916c-565e-11e9-9de2-525400123456/cddf60599b04a5ff107254047bceea0608cf59a82de9626f6f4b9c2778705be6/cpu.shares
2
/sys/fs/cgroup/cpu/kubepods/burstable/podf3f5916c-565e-11e9-9de2-525400123456/cpu.shares
102
/sys/fs/cgroup/cpu/kubepods/cpu.shares
7168
/sys/fs/cgroup/cpu/kubepods/pod45e519f1-565f-11e9-9de2-525400123456/75f8d2d2358fe8b7516f29472bb11378b108d0567dc820d0fc757469c2c9ca0f/cpu.shares
102
/sys/fs/cgroup/cpu/kubepods/pod45e519f1-565f-11e9-9de2-525400123456/cpu.shares
102
/sys/fs/cgroup/cpu/kubepods/pod45e519f1-565f-11e9-9de2-525400123456/e1a54b680bdef8d6ee1acf1fde4a2a35db950cb5ad106d11f694fc10841bdcf0/cpu.shares
2
/sys/fs/cgroup/cpu/kubepods/pod5c061c35-5666-11e9-9de2-525400123456/12d48c76f025ea5b0a2f3cbab6765b75c5666ef74528da4ef7d8c4260a075faa/cpu.shares
204
/sys/fs/cgroup/cpu/kubepods/pod5c061c35-5666-11e9-9de2-525400123456/9a9a45ea9730ef8a313070103f2701a4d1942c90eb985938d77b968f5cac460e/cpu.shares
409
/sys/fs/cgroup/cpu/kubepods/pod5c061c35-5666-11e9-9de2-525400123456/cpu.shares
614
/sys/fs/cgroup/cpu/kubepods/pod5c061c35-5666-11e9-9de2-525400123456/de7559571058b943737889052c3ebd48cba281b7ab3b0b2327825d8256e570da/cpu.shares
2
/sys/fs/cgroup/cpu/kubepods/pod99b66879-565f-11e9-9de2-525400123456/2882ebb5acb99f5b09ee41954720101a954847151951ff56e0e3a919a2044a5a/cpu.shares
2
/sys/fs/cgroup/cpu/kubepods/pod99b66879-565f-11e9-9de2-525400123456/2af8973b96abe2e023eb343cb87989a33c08b61f1bf2eead306da8f15eb74026/cpu.shares
1024
/sys/fs/cgroup/cpu/kubepods/pod99b66879-565f-11e9-9de2-525400123456/a547d6c57f8b08a493156febb9b7071320ce1d87631e2e7bea142b7ac9351f23/cpu.shares
2048
/sys/fs/cgroup/cpu/kubepods/pod99b66879-565f-11e9-9de2-525400123456/cpu.shares
3072
Here we see that
- Some burstable have quota (i.e. upper bound set)
- the quota set at pod level and then split across the containers and the period is the same across all (as seen below)
- pause containers of guaranteed pods are limited by the parent (assumption to validate)
#shopt -s globstar
#for i in `ls /sys/fs/cgroup/cpu/kubepods/**/cpu.cfs_quota_us`; do echo $i && cat $i; done
/sys/fs/cgroup/cpu/kubepods/besteffort/cpu.cfs_quota_us
-1
/sys/fs/cgroup/cpu/kubepods/besteffort/pod55148103-5666-11e9-9de2-525400123456/02f5cf17015d31beb2462857e1773754221712389d17df5a3a1e636bc04daaac/cpu.cfs_quota_us
-1
/sys/fs/cgroup/cpu/kubepods/besteffort/pod55148103-5666-11e9-9de2-525400123456/1dad0913e3a373e1742ef9dbc707fbcaab65fa1d394d5dc5c7df6c6d36569db7/cpu.cfs_quota_us
-1
/sys/fs/cgroup/cpu/kubepods/besteffort/pod55148103-5666-11e9-9de2-525400123456/6d50e7ca6b84fbd4195155e812e04e7b5976666818fe69e6d28832e63fae639f/cpu.cfs_quota_us
-1
/sys/fs/cgroup/cpu/kubepods/besteffort/pod55148103-5666-11e9-9de2-525400123456/cpu.cfs_quota_us
-1
/sys/fs/cgroup/cpu/kubepods/besteffort/pod69b754681cf0cf1bf12010694a10f2cb/6770f6eccae402706a68b71d9a593cc9f64aa2961419f2bfad4f57b265ded453/cpu.cfs_quota_us
-1
/sys/fs/cgroup/cpu/kubepods/besteffort/pod69b754681cf0cf1bf12010694a10f2cb/bab92c486dd6626714666e28e41672dc2250302699d687d7850b60550b8f03ad/cpu.cfs_quota_us
-1
/sys/fs/cgroup/cpu/kubepods/besteffort/pod69b754681cf0cf1bf12010694a10f2cb/cpu.cfs_quota_us
-1
/sys/fs/cgroup/cpu/kubepods/besteffort/podf3f04c5b-565e-11e9-9de2-525400123456/b4c11a95aa1b5ab85649e23a2905cb1b893c4aa3b0201e285e4fcef647bfe584/cpu.cfs_quota_us
-1
/sys/fs/cgroup/cpu/kubepods/besteffort/podf3f04c5b-565e-11e9-9de2-525400123456/cpu.cfs_quota_us
-1
/sys/fs/cgroup/cpu/kubepods/besteffort/podf3f04c5b-565e-11e9-9de2-525400123456/e9fef3c19e2966fb2870e99fcc682ac5a350fdf847ee415f6f3aaf88bbfbc17a/cpu.cfs_quota_us
-1
/sys/fs/cgroup/cpu/kubepods/besteffort/podf3f04c5b-565e-11e9-9de2-525400123456/e9fef3c19e2966fb2870e99fcc682ac5a350fdf847ee415f6f3aaf88bbfbc17a/kube-proxy/cpu.cfs_quota_us
-1
/sys/fs/cgroup/cpu/kubepods/burstable/cpu.cfs_quota_us
-1
/sys/fs/cgroup/cpu/kubepods/burstable/pod41a37c57-5665-11e9-9de2-525400123456/31e7f4b8bfe5a23919d507c8d683ea97a83698f3e5f4abc13e4f5b401a26f3f1/cpu.cfs_quota_us
50000
/sys/fs/cgroup/cpu/kubepods/burstable/pod41a37c57-5665-11e9-9de2-525400123456/62c5bf39f9bbd6dacb93b0a34e78f6c8db2ff0962bb90ad9ee564fb50b9c5554/cpu.cfs_quota_us
10000
/sys/fs/cgroup/cpu/kubepods/burstable/pod41a37c57-5665-11e9-9de2-525400123456/cpu.cfs_quota_us
60000
/sys/fs/cgroup/cpu/kubepods/burstable/pod41a37c57-5665-11e9-9de2-525400123456/f876aa7433c71aec840afc385a39e1f3c1541c4521e148c3771e8679f040b788/cpu.cfs_quota_us
-1
/sys/fs/cgroup/cpu/kubepods/burstable/pod439651677ca7971bec7b2a9a0df5a512/1c3e8063c7bba7cf0e4b6e777704e56e5bdcbb15a794d36423a4734e6d5cb751/cpu.cfs_quota_us
-1
/sys/fs/cgroup/cpu/kubepods/burstable/pod439651677ca7971bec7b2a9a0df5a512/a0ff4137e61acf00ed6dc94212dbfba5161a7cb5fe20217e2cfaa1ba91474ae8/cpu.cfs_quota_us
-1
/sys/fs/cgroup/cpu/kubepods/burstable/pod439651677ca7971bec7b2a9a0df5a512/cpu.cfs_quota_us
-1
/sys/fs/cgroup/cpu/kubepods/burstable/pod54146492ed90bfa147f56609eee8005a/55770fabdef138519c8e013a2150630a60bf7dc0e73d5899ca99e72124f23434/cpu.cfs_quota_us
-1
/sys/fs/cgroup/cpu/kubepods/burstable/pod54146492ed90bfa147f56609eee8005a/cd1698af7a1024920e12633350d98d3f45f83d775eb9ada5c76319cfe0fb9573/cpu.cfs_quota_us
-1
/sys/fs/cgroup/cpu/kubepods/burstable/pod54146492ed90bfa147f56609eee8005a/cpu.cfs_quota_us
-1
/sys/fs/cgroup/cpu/kubepods/burstable/pod58272442e226c838b193bbba4c44091e/22b075e3e9b9e8ad3bc7b5f2fb359de6ccbc751988e9b9cdb82c77a284de9847/cpu.cfs_quota_us
-1
/sys/fs/cgroup/cpu/kubepods/burstable/pod58272442e226c838b193bbba4c44091e/cpu.cfs_quota_us
-1
/sys/fs/cgroup/cpu/kubepods/burstable/pod58272442e226c838b193bbba4c44091e/ea20f04b19b1ce96cdd48f88a6a8a588a4352cd8a2fd7c6490d56b139f6d7f39/cpu.cfs_quota_us
-1
/sys/fs/cgroup/cpu/kubepods/burstable/podf3f524b6-565e-11e9-9de2-525400123456/4db061942925e99a1c687c5195fca2114e39347b22bc7cfcba51438de4efa31f/cpu.cfs_quota_us
-1
/sys/fs/cgroup/cpu/kubepods/burstable/podf3f524b6-565e-11e9-9de2-525400123456/cpu.cfs_quota_us
-1
/sys/fs/cgroup/cpu/kubepods/burstable/podf3f524b6-565e-11e9-9de2-525400123456/fde98ded39d10e37a883c8e08fadc8e541b1793f44548f0b3e028b6c1ddd9034/cpu.cfs_quota_us
-1
/sys/fs/cgroup/cpu/kubepods/burstable/podf3f5916c-565e-11e9-9de2-525400123456/c419fde416d40bf1d89c8dbe374b229ef3a48dbce5cf808c7caf5282254a4ece/cpu.cfs_quota_us
-1
/sys/fs/cgroup/cpu/kubepods/burstable/podf3f5916c-565e-11e9-9de2-525400123456/cddf60599b04a5ff107254047bceea0608cf59a82de9626f6f4b9c2778705be6/cpu.cfs_quota_us
-1
/sys/fs/cgroup/cpu/kubepods/burstable/podf3f5916c-565e-11e9-9de2-525400123456/cpu.cfs_quota_us
-1
/sys/fs/cgroup/cpu/kubepods/cpu.cfs_quota_us
-1
/sys/fs/cgroup/cpu/kubepods/pod45e519f1-565f-11e9-9de2-525400123456/75f8d2d2358fe8b7516f29472bb11378b108d0567dc820d0fc757469c2c9ca0f/cpu.cfs_quota_us
10000
/sys/fs/cgroup/cpu/kubepods/pod45e519f1-565f-11e9-9de2-525400123456/cpu.cfs_quota_us
10000
/sys/fs/cgroup/cpu/kubepods/pod45e519f1-565f-11e9-9de2-525400123456/e1a54b680bdef8d6ee1acf1fde4a2a35db950cb5ad106d11f694fc10841bdcf0/cpu.cfs_quota_us
-1
/sys/fs/cgroup/cpu/kubepods/pod5c061c35-5666-11e9-9de2-525400123456/12d48c76f025ea5b0a2f3cbab6765b75c5666ef74528da4ef7d8c4260a075faa/cpu.cfs_quota_us
20000
/sys/fs/cgroup/cpu/kubepods/pod5c061c35-5666-11e9-9de2-525400123456/9a9a45ea9730ef8a313070103f2701a4d1942c90eb985938d77b968f5cac460e/cpu.cfs_quota_us
40000
/sys/fs/cgroup/cpu/kubepods/pod5c061c35-5666-11e9-9de2-525400123456/cpu.cfs_quota_us
60000
/sys/fs/cgroup/cpu/kubepods/pod5c061c35-5666-11e9-9de2-525400123456/de7559571058b943737889052c3ebd48cba281b7ab3b0b2327825d8256e570da/cpu.cfs_quota_us
-1
/sys/fs/cgroup/cpu/kubepods/pod99b66879-565f-11e9-9de2-525400123456/2882ebb5acb99f5b09ee41954720101a954847151951ff56e0e3a919a2044a5a/cpu.cfs_quota_us
-1
/sys/fs/cgroup/cpu/kubepods/pod99b66879-565f-11e9-9de2-525400123456/2af8973b96abe2e023eb343cb87989a33c08b61f1bf2eead306da8f15eb74026/cpu.cfs_quota_us
100000
/sys/fs/cgroup/cpu/kubepods/pod99b66879-565f-11e9-9de2-525400123456/a547d6c57f8b08a493156febb9b7071320ce1d87631e2e7bea142b7ac9351f23/cpu.cfs_quota_us
200000
/sys/fs/cgroup/cpu/kubepods/pod99b66879-565f-11e9-9de2-525400123456/cpu.cfs_quota_us
300000
#shopt -s globstar
#for i in `ls /sys/fs/cgroup/cpu/kubepods/**/cpu.cfs_period_us`; do echo $i && cat $i; done
/sys/fs/cgroup/cpu/kubepods/besteffort/cpu.cfs_period_us
100000
/sys/fs/cgroup/cpu/kubepods/besteffort/pod55148103-5666-11e9-9de2-525400123456/02f5cf17015d31beb2462857e1773754221712389d17df5a3a1e636bc04daaac/cpu.cfs_period_us
100000
/sys/fs/cgroup/cpu/kubepods/besteffort/pod55148103-5666-11e9-9de2-525400123456/1dad0913e3a373e1742ef9dbc707fbcaab65fa1d394d5dc5c7df6c6d36569db7/cpu.cfs_period_us
100000
/sys/fs/cgroup/cpu/kubepods/besteffort/pod55148103-5666-11e9-9de2-525400123456/6d50e7ca6b84fbd4195155e812e04e7b5976666818fe69e6d28832e63fae639f/cpu.cfs_period_us
100000
/sys/fs/cgroup/cpu/kubepods/besteffort/pod55148103-5666-11e9-9de2-525400123456/cpu.cfs_period_us
100000
/sys/fs/cgroup/cpu/kubepods/besteffort/pod69b754681cf0cf1bf12010694a10f2cb/6770f6eccae402706a68b71d9a593cc9f64aa2961419f2bfad4f57b265ded453/cpu.cfs_period_us
100000
/sys/fs/cgroup/cpu/kubepods/besteffort/pod69b754681cf0cf1bf12010694a10f2cb/bab92c486dd6626714666e28e41672dc2250302699d687d7850b60550b8f03ad/cpu.cfs_period_us
100000
/sys/fs/cgroup/cpu/kubepods/besteffort/pod69b754681cf0cf1bf12010694a10f2cb/cpu.cfs_period_us
100000
/sys/fs/cgroup/cpu/kubepods/besteffort/podf3f04c5b-565e-11e9-9de2-525400123456/b4c11a95aa1b5ab85649e23a2905cb1b893c4aa3b0201e285e4fcef647bfe584/cpu.cfs_period_us
100000
/sys/fs/cgroup/cpu/kubepods/besteffort/podf3f04c5b-565e-11e9-9de2-525400123456/cpu.cfs_period_us
100000
/sys/fs/cgroup/cpu/kubepods/besteffort/podf3f04c5b-565e-11e9-9de2-525400123456/e9fef3c19e2966fb2870e99fcc682ac5a350fdf847ee415f6f3aaf88bbfbc17a/cpu.cfs_period_us
100000
/sys/fs/cgroup/cpu/kubepods/besteffort/podf3f04c5b-565e-11e9-9de2-525400123456/e9fef3c19e2966fb2870e99fcc682ac5a350fdf847ee415f6f3aaf88bbfbc17a/kube-proxy/cpu.cfs_period_us
100000
/sys/fs/cgroup/cpu/kubepods/burstable/cpu.cfs_period_us
100000
/sys/fs/cgroup/cpu/kubepods/burstable/pod41a37c57-5665-11e9-9de2-525400123456/31e7f4b8bfe5a23919d507c8d683ea97a83698f3e5f4abc13e4f5b401a26f3f1/cpu.cfs_period_us
100000
/sys/fs/cgroup/cpu/kubepods/burstable/pod41a37c57-5665-11e9-9de2-525400123456/62c5bf39f9bbd6dacb93b0a34e78f6c8db2ff0962bb90ad9ee564fb50b9c5554/cpu.cfs_period_us
100000
/sys/fs/cgroup/cpu/kubepods/burstable/pod41a37c57-5665-11e9-9de2-525400123456/cpu.cfs_period_us
100000
/sys/fs/cgroup/cpu/kubepods/burstable/pod41a37c57-5665-11e9-9de2-525400123456/f876aa7433c71aec840afc385a39e1f3c1541c4521e148c3771e8679f040b788/cpu.cfs_period_us
100000
/sys/fs/cgroup/cpu/kubepods/burstable/pod439651677ca7971bec7b2a9a0df5a512/1c3e8063c7bba7cf0e4b6e777704e56e5bdcbb15a794d36423a4734e6d5cb751/cpu.cfs_period_us
100000
/sys/fs/cgroup/cpu/kubepods/burstable/pod439651677ca7971bec7b2a9a0df5a512/a0ff4137e61acf00ed6dc94212dbfba5161a7cb5fe20217e2cfaa1ba91474ae8/cpu.cfs_period_us
100000
/sys/fs/cgroup/cpu/kubepods/burstable/pod439651677ca7971bec7b2a9a0df5a512/cpu.cfs_period_us
100000
/sys/fs/cgroup/cpu/kubepods/burstable/pod54146492ed90bfa147f56609eee8005a/55770fabdef138519c8e013a2150630a60bf7dc0e73d5899ca99e72124f23434/cpu.cfs_period_us
100000
/sys/fs/cgroup/cpu/kubepods/burstable/pod54146492ed90bfa147f56609eee8005a/cd1698af7a1024920e12633350d98d3f45f83d775eb9ada5c76319cfe0fb9573/cpu.cfs_period_us
100000
/sys/fs/cgroup/cpu/kubepods/burstable/pod54146492ed90bfa147f56609eee8005a/cpu.cfs_period_us
100000
/sys/fs/cgroup/cpu/kubepods/burstable/pod58272442e226c838b193bbba4c44091e/22b075e3e9b9e8ad3bc7b5f2fb359de6ccbc751988e9b9cdb82c77a284de9847/cpu.cfs_period_us
100000
/sys/fs/cgroup/cpu/kubepods/burstable/pod58272442e226c838b193bbba4c44091e/cpu.cfs_period_us
100000
/sys/fs/cgroup/cpu/kubepods/burstable/pod58272442e226c838b193bbba4c44091e/ea20f04b19b1ce96cdd48f88a6a8a588a4352cd8a2fd7c6490d56b139f6d7f39/cpu.cfs_period_us
100000
/sys/fs/cgroup/cpu/kubepods/burstable/podf3f524b6-565e-11e9-9de2-525400123456/4db061942925e99a1c687c5195fca2114e39347b22bc7cfcba51438de4efa31f/cpu.cfs_period_us
100000
/sys/fs/cgroup/cpu/kubepods/burstable/podf3f524b6-565e-11e9-9de2-525400123456/cpu.cfs_period_us
100000
/sys/fs/cgroup/cpu/kubepods/burstable/podf3f524b6-565e-11e9-9de2-525400123456/fde98ded39d10e37a883c8e08fadc8e541b1793f44548f0b3e028b6c1ddd9034/cpu.cfs_period_us
100000
/sys/fs/cgroup/cpu/kubepods/burstable/podf3f5916c-565e-11e9-9de2-525400123456/c419fde416d40bf1d89c8dbe374b229ef3a48dbce5cf808c7caf5282254a4ece/cpu.cfs_period_us
100000
/sys/fs/cgroup/cpu/kubepods/burstable/podf3f5916c-565e-11e9-9de2-525400123456/cddf60599b04a5ff107254047bceea0608cf59a82de9626f6f4b9c2778705be6/cpu.cfs_period_us
100000
/sys/fs/cgroup/cpu/kubepods/burstable/podf3f5916c-565e-11e9-9de2-525400123456/cpu.cfs_period_us
100000
/sys/fs/cgroup/cpu/kubepods/cpu.cfs_period_us
100000
/sys/fs/cgroup/cpu/kubepods/pod45e519f1-565f-11e9-9de2-525400123456/75f8d2d2358fe8b7516f29472bb11378b108d0567dc820d0fc757469c2c9ca0f/cpu.cfs_period_us
100000
/sys/fs/cgroup/cpu/kubepods/pod45e519f1-565f-11e9-9de2-525400123456/cpu.cfs_period_us
100000
/sys/fs/cgroup/cpu/kubepods/pod45e519f1-565f-11e9-9de2-525400123456/e1a54b680bdef8d6ee1acf1fde4a2a35db950cb5ad106d11f694fc10841bdcf0/cpu.cfs_period_us
100000
/sys/fs/cgroup/cpu/kubepods/pod5c061c35-5666-11e9-9de2-525400123456/12d48c76f025ea5b0a2f3cbab6765b75c5666ef74528da4ef7d8c4260a075faa/cpu.cfs_period_us
100000
/sys/fs/cgroup/cpu/kubepods/pod5c061c35-5666-11e9-9de2-525400123456/9a9a45ea9730ef8a313070103f2701a4d1942c90eb985938d77b968f5cac460e/cpu.cfs_period_us
100000
/sys/fs/cgroup/cpu/kubepods/pod5c061c35-5666-11e9-9de2-525400123456/cpu.cfs_period_us
100000
/sys/fs/cgroup/cpu/kubepods/pod5c061c35-5666-11e9-9de2-525400123456/de7559571058b943737889052c3ebd48cba281b7ab3b0b2327825d8256e570da/cpu.cfs_period_us
100000
/sys/fs/cgroup/cpu/kubepods/pod99b66879-565f-11e9-9de2-525400123456/2882ebb5acb99f5b09ee41954720101a954847151951ff56e0e3a919a2044a5a/cpu.cfs_period_us
100000
/sys/fs/cgroup/cpu/kubepods/pod99b66879-565f-11e9-9de2-525400123456/2af8973b96abe2e023eb343cb87989a33c08b61f1bf2eead306da8f15eb74026/cpu.cfs_period_us
100000
/sys/fs/cgroup/cpu/kubepods/pod99b66879-565f-11e9-9de2-525400123456/a547d6c57f8b08a493156febb9b7071320ce1d87631e2e7bea142b7ac9351f23/cpu.cfs_period_us
100000
/sys/fs/cgroup/cpu/kubepods/pod99b66879-565f-11e9-9de2-525400123456/cpu.cfs_period_us
100000
- Best effort have no limits
- Some Burstable pods have pod level memory limits set (and container level)
- Pause is limited by POD level memory limits
- Guaranteed pods have pod level limits
- Pause is limited by POD level memory limits
#for i in `ls /sys/fs/cgroup/memory/kubepods/**/memory.limit_in_bytes`; do echo $i && cat $i; done
/sys/fs/cgroup/memory/kubepods/besteffort/memory.limit_in_bytes
9223372036854771712
/sys/fs/cgroup/memory/kubepods/besteffort/pod55148103-5666-11e9-9de2-525400123456/02f5cf17015d31beb2462857e1773754221712389d17df5a3a1e636bc04daaac/memory.limit_in_bytes
9223372036854771712
/sys/fs/cgroup/memory/kubepods/besteffort/pod55148103-5666-11e9-9de2-525400123456/1dad0913e3a373e1742ef9dbc707fbcaab65fa1d394d5dc5c7df6c6d36569db7/memory.limit_in_bytes
9223372036854771712
/sys/fs/cgroup/memory/kubepods/besteffort/pod55148103-5666-11e9-9de2-525400123456/6d50e7ca6b84fbd4195155e812e04e7b5976666818fe69e6d28832e63fae639f/memory.limit_in_bytes
9223372036854771712
/sys/fs/cgroup/memory/kubepods/besteffort/pod55148103-5666-11e9-9de2-525400123456/memory.limit_in_bytes
9223372036854771712
/sys/fs/cgroup/memory/kubepods/besteffort/pod69b754681cf0cf1bf12010694a10f2cb/6770f6eccae402706a68b71d9a593cc9f64aa2961419f2bfad4f57b265ded453/memory.limit_in_bytes
9223372036854771712
/sys/fs/cgroup/memory/kubepods/besteffort/pod69b754681cf0cf1bf12010694a10f2cb/bab92c486dd6626714666e28e41672dc2250302699d687d7850b60550b8f03ad/memory.limit_in_bytes
9223372036854771712
/sys/fs/cgroup/memory/kubepods/besteffort/pod69b754681cf0cf1bf12010694a10f2cb/memory.limit_in_bytes
9223372036854771712
/sys/fs/cgroup/memory/kubepods/besteffort/podf3f04c5b-565e-11e9-9de2-525400123456/b4c11a95aa1b5ab85649e23a2905cb1b893c4aa3b0201e285e4fcef647bfe584/memory.limit_in_bytes
9223372036854771712
/sys/fs/cgroup/memory/kubepods/besteffort/podf3f04c5b-565e-11e9-9de2-525400123456/e9fef3c19e2966fb2870e99fcc682ac5a350fdf847ee415f6f3aaf88bbfbc17a/kube-proxy/memory.limit_in_bytes
9223372036854771712
/sys/fs/cgroup/memory/kubepods/besteffort/podf3f04c5b-565e-11e9-9de2-525400123456/e9fef3c19e2966fb2870e99fcc682ac5a350fdf847ee415f6f3aaf88bbfbc17a/memory.limit_in_bytes
9223372036854771712
/sys/fs/cgroup/memory/kubepods/besteffort/podf3f04c5b-565e-11e9-9de2-525400123456/memory.limit_in_bytes
9223372036854771712
/sys/fs/cgroup/memory/kubepods/burstable/memory.limit_in_bytes
9223372036854771712
/sys/fs/cgroup/memory/kubepods/burstable/pod41a37c57-5665-11e9-9de2-525400123456/31e7f4b8bfe5a23919d507c8d683ea97a83698f3e5f4abc13e4f5b401a26f3f1/memory.limit_in_bytes
419430400
/sys/fs/cgroup/memory/kubepods/burstable/pod41a37c57-5665-11e9-9de2-525400123456/62c5bf39f9bbd6dacb93b0a34e78f6c8db2ff0962bb90ad9ee564fb50b9c5554/memory.limit_in_bytes
209715200
/sys/fs/cgroup/memory/kubepods/burstable/pod41a37c57-5665-11e9-9de2-525400123456/f876aa7433c71aec840afc385a39e1f3c1541c4521e148c3771e8679f040b788/memory.limit_in_bytes
9223372036854771712
/sys/fs/cgroup/memory/kubepods/burstable/pod41a37c57-5665-11e9-9de2-525400123456/memory.limit_in_bytes
629145600
/sys/fs/cgroup/memory/kubepods/burstable/pod439651677ca7971bec7b2a9a0df5a512/1c3e8063c7bba7cf0e4b6e777704e56e5bdcbb15a794d36423a4734e6d5cb751/memory.limit_in_bytes
9223372036854771712
/sys/fs/cgroup/memory/kubepods/burstable/pod439651677ca7971bec7b2a9a0df5a512/a0ff4137e61acf00ed6dc94212dbfba5161a7cb5fe20217e2cfaa1ba91474ae8/memory.limit_in_bytes
9223372036854771712
/sys/fs/cgroup/memory/kubepods/burstable/pod439651677ca7971bec7b2a9a0df5a512/memory.limit_in_bytes
9223372036854771712
/sys/fs/cgroup/memory/kubepods/burstable/pod54146492ed90bfa147f56609eee8005a/55770fabdef138519c8e013a2150630a60bf7dc0e73d5899ca99e72124f23434/memory.limit_in_bytes
9223372036854771712
/sys/fs/cgroup/memory/kubepods/burstable/pod54146492ed90bfa147f56609eee8005a/cd1698af7a1024920e12633350d98d3f45f83d775eb9ada5c76319cfe0fb9573/memory.limit_in_bytes
9223372036854771712
/sys/fs/cgroup/memory/kubepods/burstable/pod54146492ed90bfa147f56609eee8005a/memory.limit_in_bytes
9223372036854771712
/sys/fs/cgroup/memory/kubepods/burstable/pod58272442e226c838b193bbba4c44091e/22b075e3e9b9e8ad3bc7b5f2fb359de6ccbc751988e9b9cdb82c77a284de9847/memory.limit_in_bytes
9223372036854771712
/sys/fs/cgroup/memory/kubepods/burstable/pod58272442e226c838b193bbba4c44091e/ea20f04b19b1ce96cdd48f88a6a8a588a4352cd8a2fd7c6490d56b139f6d7f39/memory.limit_in_bytes
9223372036854771712
/sys/fs/cgroup/memory/kubepods/burstable/pod58272442e226c838b193bbba4c44091e/memory.limit_in_bytes
9223372036854771712
/sys/fs/cgroup/memory/kubepods/burstable/podf3f524b6-565e-11e9-9de2-525400123456/4db061942925e99a1c687c5195fca2114e39347b22bc7cfcba51438de4efa31f/memory.limit_in_bytes
178257920
/sys/fs/cgroup/memory/kubepods/burstable/podf3f524b6-565e-11e9-9de2-525400123456/fde98ded39d10e37a883c8e08fadc8e541b1793f44548f0b3e028b6c1ddd9034/memory.limit_in_bytes
9223372036854771712
/sys/fs/cgroup/memory/kubepods/burstable/podf3f524b6-565e-11e9-9de2-525400123456/memory.limit_in_bytes
178257920
/sys/fs/cgroup/memory/kubepods/burstable/podf3f5916c-565e-11e9-9de2-525400123456/c419fde416d40bf1d89c8dbe374b229ef3a48dbce5cf808c7caf5282254a4ece/memory.limit_in_bytes
178257920
/sys/fs/cgroup/memory/kubepods/burstable/podf3f5916c-565e-11e9-9de2-525400123456/cddf60599b04a5ff107254047bceea0608cf59a82de9626f6f4b9c2778705be6/memory.limit_in_bytes
9223372036854771712
/sys/fs/cgroup/memory/kubepods/burstable/podf3f5916c-565e-11e9-9de2-525400123456/memory.limit_in_bytes
178257920
/sys/fs/cgroup/memory/kubepods/memory.limit_in_bytes
7851159552
/sys/fs/cgroup/memory/kubepods/pod45e519f1-565f-11e9-9de2-525400123456/75f8d2d2358fe8b7516f29472bb11378b108d0567dc820d0fc757469c2c9ca0f/memory.limit_in_bytes
52428800
/sys/fs/cgroup/memory/kubepods/pod45e519f1-565f-11e9-9de2-525400123456/e1a54b680bdef8d6ee1acf1fde4a2a35db950cb5ad106d11f694fc10841bdcf0/memory.limit_in_bytes
9223372036854771712
/sys/fs/cgroup/memory/kubepods/pod45e519f1-565f-11e9-9de2-525400123456/memory.limit_in_bytes
52428800
/sys/fs/cgroup/memory/kubepods/pod5c061c35-5666-11e9-9de2-525400123456/12d48c76f025ea5b0a2f3cbab6765b75c5666ef74528da4ef7d8c4260a075faa/memory.limit_in_bytes
209715200
/sys/fs/cgroup/memory/kubepods/pod5c061c35-5666-11e9-9de2-525400123456/9a9a45ea9730ef8a313070103f2701a4d1942c90eb985938d77b968f5cac460e/memory.limit_in_bytes
419430400
/sys/fs/cgroup/memory/kubepods/pod5c061c35-5666-11e9-9de2-525400123456/de7559571058b943737889052c3ebd48cba281b7ab3b0b2327825d8256e570da/memory.limit_in_bytes
9223372036854771712
/sys/fs/cgroup/memory/kubepods/pod5c061c35-5666-11e9-9de2-525400123456/memory.limit_in_bytes
629145600
/sys/fs/cgroup/memory/kubepods/pod99b66879-565f-11e9-9de2-525400123456/2882ebb5acb99f5b09ee41954720101a954847151951ff56e0e3a919a2044a5a/memory.limit_in_bytes
9223372036854771712
/sys/fs/cgroup/memory/kubepods/pod99b66879-565f-11e9-9de2-525400123456/2af8973b96abe2e023eb343cb87989a33c08b61f1bf2eead306da8f15eb74026/memory.limit_in_bytes
209715200
/sys/fs/cgroup/memory/kubepods/pod99b66879-565f-11e9-9de2-525400123456/a547d6c57f8b08a493156febb9b7071320ce1d87631e2e7bea142b7ac9351f23/memory.limit_in_bytes
419430400
/sys/fs/cgroup/memory/kubepods/pod99b66879-565f-11e9-9de2-525400123456/memory.limit_in_bytes
629145600
#top -b
top - 23:47:30 up 22:52, 4 users, load average: 9.59, 9.14, 8.63
Tasks: 206 total, 9 running, 133 sleeping, 0 stopped, 0 zombie
%Cpu(s): 2.1 us, 8.5 sy, 0.0 ni, 89.1 id, 0.1 wa, 0.0 hi, 0.1 si, 0.1 st
KiB Mem : 8167148 total, 4444140 free, 692316 used, 3030692 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 7184956 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
8555 root 20 0 1288 4 0 R 100.0 0.0 80:27.66 md5sum
8624 root 20 0 1288 4 0 R 100.0 0.0 80:26.17 md5sum
25546 root 20 0 1288 4 0 R 100.0 0.0 28:16.58 md5sum
25691 root 20 0 1288 4 0 R 93.8 0.0 28:20.72 md5sum
10705 root 20 0 1288 4 0 R 37.5 0.0 19:53.62 md5sum
26277 root 20 0 1288 4 0 R 31.2 0.0 12:43.99 md5sum
26360 root 20 0 1288 4 0 R 12.5 0.0 6:24.16 md5sum
6409 root 20 0 44532 3932 3372 R 6.2 0.0 0:00.01 top
26252 root 20 0 11788 6076 4184 S 6.2 0.1 0:00.95 containerd-shim
32405 root 20 0 2163916 100936 65528 S 6.2 1.2 2:32.42 kubelet
1 root 20 0 225524 9308 6644 S 0.0 0.1 0:40.32 systemd
2 root 20 0 0 0 0 S 0.0 0.0 0:00.02 kthreadd
Non-terminated Pods: (12 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE
--------- ---- ------------ ---------- --------------- ------------- ---
default beff-2 0 (0%) 0 (0%) 0 (0%) 0 (0%) 33m
default burst-2 300m (4%) 600m (8%) 400Mi (5%) 600Mi (8%) 41m
default guar-2 600m (8%) 600m (8%) 600Mi (8%) 600Mi (8%) 33m
default guar-2s 3 (42%) 3 (42%) 600Mi (8%) 600Mi (8%) 81m
kube-system coredns-fb8b8dccf-ng6st 100m (1%) 0 (0%) 70Mi (0%) 170Mi (2%) 86m
kube-system coredns-fb8b8dccf-tctwp 100m (1%) 0 (0%) 70Mi (0%) 170Mi (2%) 86m
kube-system etcd-bored-pelinor 0 (0%) 0 (0%) 0 (0%) 0 (0%) 85m
kube-system kube-apiserver-bored-pelinor 250m (3%) 0 (0%) 0 (0%) 0 (0%) 85m
kube-system kube-controller-manager-bored-pelinor 200m (2%) 0 (0%) 0 (0%) 0 (0%) 85m
kube-system kube-flannel-ds-amd64-gtwrh 100m (1%) 100m (1%) 50Mi (0%) 50Mi (0%) 84m
kube-system kube-proxy-vk6j9 0 (0%) 0 (0%) 0 (0%) 0 (0%) 86m
kube-system kube-scheduler-bored-pelinor 100m (1%) 0 (0%) 0 (0%) 0 (0%) 85m
#kubectl top pod --all-namespaces
NAMESPACE NAME CPU(cores) MEMORY(bytes)
default beff-2 1929m 1Mi
default burst-2 601m 1Mi
default guar-2 600m 1Mi
default guar-2s 2000m 1Mi
kube-system coredns-fb8b8dccf-ng6st 2m 11Mi
kube-system coredns-fb8b8dccf-tctwp 2m 9Mi
kube-system etcd-bored-pelinor 14m 40Mi
kube-system kube-apiserver-bored-pelinor 21m 233Mi
kube-system kube-controller-manager-bored-pelinor 8m 40Mi
kube-system kube-flannel-ds-amd64-gtwrh 2m 11Mi
kube-system kube-proxy-vk6j9 1m 14Mi
kube-system kube-scheduler-bored-pelinor 1m 12Mi
kube-system metrics-server-78b6bc9ddf-z8fvw 1m 13Mi
- set at container level (NOT pod level)
- pause container is not pinned
- this is not good for kata as we cannot use the pod or pause (unless we muck up pause)
- set pod level (and container level)
- this is good for kata as QEMU can run in the pod cgroup
- cpu is compressible so shares being too low is not an issue
- set at pod level
- this is good for kata as qemu can run in the pod cgroup
- most pods are ubound, so this is good for kata in a way - pods that are bound will need correctly passed upper limits (see sandbox overhead proposal described later)
-
set at pod level
- this is good for kata as qemu can run in the pod cgroup
- but may be too small as defined resulting in OOM
- So absolutely need propotional pod overhead for bound pods
-
Once @egernst adds support for sandbox overhead they should be additive to pod level limits, when limits actually apply.
-
However the sandbox cannot be the pause container. It needs to be the pod itself (and that what is right).
-
However it leaves out the issue of cpusets.
-
Given that cpusets should be applicable to pod cgroup but current not applied to the same, it may make sense to modify the upstream logic to set them up at pod level.
- note: However when the reconciliation loop runs it should first open up the pod and apply container sets and then close the pod. This will make the loop a little more complex unlike the current implementation.
Using the raw kubernetes API, you will see the breakdown of the metrics at the pod level
mrcastel@bored-pelinor:~$ kubectl get --raw "/apis/metrics.k8s.io/v1beta1/namespaces/default/pods/guar-2" | jq -C .
{
"kind": "PodMetrics",
"apiVersion": "metrics.k8s.io/v1beta1",
"metadata": {
"name": "guar-2",
"namespace": "default",
"selfLink": "/apis/metrics.k8s.io/v1beta1/namespaces/default/pods/guar-2",
"creationTimestamp": "2019-04-04T23:52:59Z"
},
"timestamp": "2019-04-04T23:51:54Z",
"window": "30s",
"containers": [
{
"name": "busybee",
"usage": {
"cpu": "398739040n",
"memory": "744Ki"
}
},
{
"name": "busybum",
"usage": {
"cpu": "200848120n",
"memory": "852Ki"
}
}
]
}
mrcastel@bored-pelinor:~$ kubectl get pod --all-namespaces -o=custom-columns=NAME:.metadata.name,UID:.metadata.uid | grep guar-2
guar-2 839f41ff-5728-11e9-9de2-525400123456
Looking at the cgroups
#for i in `ls /sys/fs/cgroup/memory/kubepods/pod839f41ff-5728-11e9-9de2-525400123456/**/memory.usage_in_bytes`; do echo $i && cat $i; done
/sys/fs/cgroup/memory/kubepods/pod839f41ff-5728-11e9-9de2-525400123456/2de4cc4497c0d69aefc9e11dc64536a9dcd4d6a3cc51fc4dfd138028ef9b6314/memory.usage_in_bytes
761856
/sys/fs/cgroup/memory/kubepods/pod839f41ff-5728-11e9-9de2-525400123456/66be758085f37040c49d94d04129ec03a05b8f65fe8419f6665d365a0eb14625/memory.usage_in_bytes
872448
/sys/fs/cgroup/memory/kubepods/pod839f41ff-5728-11e9-9de2-525400123456/ab621c92b945758ee275315251c7a491467aabbc0d8fdaf02158c50f98e14ddc/memory.usage_in_bytes
614400
/sys/fs/cgroup/memory/kubepods/pod839f41ff-5728-11e9-9de2-525400123456/memory.usage_in_bytes
2248704
Note: The pause container has memory usage, but the metrics does not report or care about it.
mrcastel@bored-pelinor:~$ kubectl top pod
NAME CPU(cores) MEMORY(bytes)
beff-2 1955m 1Mi
burst-2 599m 1Mi
guar-2 597m 1Mi
guar-2s 2001m 1Mi
Note: The total memory usage here does not quite match up (maybe rounded down?)
/sys/fs/cgroup/cpu/kubepods/pod839f41ff-5728-11e9-9de2-525400123456/cpuacct.usage
3664938987887
#for i in `ls /sys/fs/cgroup/cpu/kubepods/pod839f41ff-5728-11e9-9de2-525400123456/**/cpuacct.usage`; do echo $i && cat $i; done
/sys/fs/cgroup/cpu/kubepods/pod839f41ff-5728-11e9-9de2-525400123456/2de4cc4497c0d69aefc9e11dc64536a9dcd4d6a3cc51fc4dfd138028ef9b6314/cpuacct.usage
2436732506834
/sys/fs/cgroup/cpu/kubepods/pod839f41ff-5728-11e9-9de2-525400123456/66be758085f37040c49d94d04129ec03a05b8f65fe8419f6665d365a0eb14625/cpuacct.usage
1228524887574
/sys/fs/cgroup/cpu/kubepods/pod839f41ff-5728-11e9-9de2-525400123456/ab621c92b945758ee275315251c7a491467aabbc0d8fdaf02158c50f98e14ddc/cpuacct.usage
42275543
/sys/fs/cgroup/cpu/kubepods/pod839f41ff-5728-11e9-9de2-525400123456/cpuacct.usage
3665299669951
Note: The cpu usage is cumulative in nano seconds. So a bit harder to co-relate.
Open: How can Kata provide these numbers at a container level?
https://medium.com/@betz.mark/understanding-resource-limits-in-kubernetes-memory-6b41e9a955f9 https://medium.com/@betz.mark/understanding-resource-limits-in-kubernetes-cpu-time-9eff74d3161b https://github.com/kubernetes/community/blob/master/contributors/design-proposals/node/node-allocatable.md#recommended-cgroups-setup rancher/rancher#17177 https://kubernetes.io/docs/tasks/administer-cluster/reserve-compute-resources/ https://libvirt.org/formatdomain.html https://github.com/torvalds/linux/blob/master/Documentation/cgroup-v1/cpuacct.txt https://godoc.org/k8s.io/kubelet/config/v1beta1#KubeletConfiguration https://kubernetes.io/docs/tasks/administer-cluster/manage-resources/memory-default-namespace/ https://docs.openshift.com/container-platform/3.3/admin_guide/allocating_node_resources.html#node-enforcement https://www.certdepot.net/rhel7-get-started-cgroups/
Let us start two jobs
#bash -c "exec -a jobmore stress-ng --cpu 3 --timeout 120m"stress-ng --cpu 8 --timeout 120m" &
#bash -c "exec -a jobless stress-ng --cpu 3 --timeout 120m"stress-ng --cpu 8 --timeout 120m" &
and then examine the resource usage
#htop
...
24036 mrcastel 20 0 53800 5712 5392 S 0.0 0.1 0:00.00 jobless --cpu 3 --timeout 120m
24037 mrcastel 20 0 54448 7184 4336 R 68.6 0.1 3:43.63 jobless --cpu 3 --timeout 120m
24038 mrcastel 20 0 54448 7184 4336 R 68.0 0.1 3:43.67 jobless --cpu 3 --timeout 120m
24039 mrcastel 20 0 54448 7184 4336 R 67.3 0.1 3:43.67 jobless --cpu 3 --timeout 120m
24137 mrcastel 20 0 53800 5808 5492 S 0.0 0.1 0:00.01 jobmore --cpu 3 --timeout 120m
24138 mrcastel 20 0 54444 7160 4308 R 100. 0.1 4:08.10 jobmore --cpu 3 --timeout 120m
24139 mrcastel 20 0 54444 7160 4308 R 100. 0.1 4:07.73 jobmore --cpu 3 --timeout 120m
24140 mrcastel 20 0 54444 7160 4308 R 100. 0.1 4:08.31 jobmore --cpu 3 --timeout 120m
At this point the jobs are using all of the CPUs on the system the best they can
#mkdir /sys/fs/cgroup/cpu/testcg
#mkdir /sys/fs/cgroup/cpu/testcg/jobless
#mkdir /sys/fs/cgroup/cpu/testcg/jobmore
#echo "24036" > /sys/fs/cgroup/cpu/testcg/jobless/tasks
#echo "24037" >> /sys/fs/cgroup/cpu/testcg/jobless/tasks
#echo "24038" >> /sys/fs/cgroup/cpu/testcg/jobless/tasks
#echo "24039" >> /sys/fs/cgroup/cpu/testcg/jobless/tasks
#echo "24137" > /sys/fs/cgroup/cpu/testcg/jobmore/tasks
#echo "24138" >> /sys/fs/cgroup/cpu/testcg/jobmore/tasks
#echo "24139" >> /sys/fs/cgroup/cpu/testcg/jobmore/tasks
#echo "24140" >> /sys/fs/cgroup/cpu/testcg/jobmore/tasks
Now let us upper bound at the parent cgroup level and split the time amongst the children
#echo 300000 > /sys/fs/cgroup/cpu/testcg/cpu.cfs_quota_us
#echo 100000 > /sys/fs/cgroup/cpu/testcg/jobless/cpu.cfs_quota_us
#echo 200000 > /sys/fs/cgroup/cpu/testcg/jobmore/cpu.cfs_quota_us
and look at the resource utilization
#htop
...
24036 mrcastel 20 0 53800 5712 5392 S 0.0 0.1 0:00.00 jobless --cpu 3 --timeout 120m
24037 mrcastel 20 0 54448 7184 4336 R 32.4 0.1 5:53.28 jobless --cpu 3 --timeout 120m
24038 mrcastel 20 0 54448 7184 4336 R 34.4 0.1 5:53.68 jobless --cpu 3 --timeout 120m
24039 mrcastel 20 0 54448 7184 4336 R 34.4 0.1 5:53.53 jobless --cpu 3 --timeout 120m
24137 mrcastel 20 0 53800 5808 5492 S 0.0 0.1 0:00.01 jobmore --cpu 3 --timeout 120m
24138 mrcastel 20 0 54444 7160 4308 R 66.1 0.1 6:49.54 jobmore --cpu 3 --timeout 120m
24139 mrcastel 20 0 54444 7160 4308 R 66.1 0.1 6:49.39 jobmore --cpu 3 --timeout 120m
24140 mrcastel 20 0 54444 7160 4308 R 68.0 0.1 6:50.38 jobmore --cpu 3 --timeout 120m
So we see that all the jobs fit within 3 CPUs. And furthermore jobless gets only 1 CPU and jobmore gets 2 CPUs.
Now let us give them the same upper bound
/sys/fs/cgroup/cpu/testcg/cpu.cfs_quota_us
200000
/sys/fs/cgroup/cpu/testcg/jobless/cpu.cfs_quota_us
100000
/sys/fs/cgroup/cpu/testcg/jobmore/cpu.cfs_quota_us
100000
We see the CPU get split evenly
24036 mrcastel 20 0 53800 5712 5392 S 0.0 0.1 0:00.00 jobless --cpu 3 --timeout 120m
24037 mrcastel 20 0 54448 7184 4336 R 33.3 0.1 9:24.10 jobless --cpu 3 --timeout 120m
24038 mrcastel 20 0 54448 7184 4336 R 33.3 0.1 9:24.36 jobless --cpu 3 --timeout 120m
24039 mrcastel 20 0 54448 7184 4336 R 33.9 0.1 9:24.20 jobless --cpu 3 --timeout 120m
24137 mrcastel 20 0 53800 5808 5492 S 0.0 0.1 0:00.01 jobmore --cpu 3 --timeout 120m
24138 mrcastel 20 0 54444 7160 4308 R 33.3 0.1 11:54.74 jobmore --cpu 3 --timeout 120m
24139 mrcastel 20 0 54444 7160 4308 R 32.6 0.1 11:55.03 jobmore --cpu 3 --timeout 120m
24140 mrcastel 20 0 54444 7160 4308 R 33.3 0.1 11:55.74 jobmore --cpu 3 --timeout 120m
But let us say jobmore is more important, so let us setup the shares accordingly
/sys/fs/cgroup/cpu/testcg/cpu.shares
1024
/sys/fs/cgroup/cpu/testcg/jobless/cpu.shares
24
/sys/fs/cgroup/cpu/testcg/jobmore/cpu.shares
1000
We do not quite see what we expected, that is because there are enough free CPUs floating
24036 mrcastel 20 0 53800 5712 5392 S 0.0 0.1 0:00.00 jobless --cpu 3 --timeout 120m
24037 mrcastel 20 0 54448 7184 4336 R 33.3 0.1 9:24.10 jobless --cpu 3 --timeout 120m
24038 mrcastel 20 0 54448 7184 4336 R 33.3 0.1 9:24.36 jobless --cpu 3 --timeout 120m
24039 mrcastel 20 0 54448 7184 4336 R 33.9 0.1 9:24.20 jobless --cpu 3 --timeout 120m
24137 mrcastel 20 0 53800 5808 5492 S 0.0 0.1 0:00.01 jobmore --cpu 3 --timeout 120m
24138 mrcastel 20 0 54444 7160 4308 R 33.3 0.1 11:54.74 jobmore --cpu 3 --timeout 120m
24139 mrcastel 20 0 54444 7160 4308 R 32.6 0.1 11:55.03 jobmore --cpu 3 --timeout 120m
24140 mrcastel 20 0 54444 7160 4308 R 33.3 0.1 11:55.74 jobmore --cpu 3 --timeout 120m
So let us force all the tasks to the same cpu
$ taskset -p 10 24036
$ taskset -p 10 24037
$ taskset -p 10 24038
$ taskset -p 10 24039
$ taskset -p 10 24137
$ taskset -p 10 24138
$ taskset -p 10 24139
$ taskset -p 10 24140
Now you see jobmore get the correct lower bound, even though both jobs have the same upper bound.
24036 mrcastel 20 0 53800 5712 5392 S 0.0 0.1 0:00.00 jobless --cpu 3 --timeout 120m
24037 mrcastel 20 0 54448 7184 4336 R 0.7 0.1 15:34.61 jobless --cpu 3 --timeout 120m
24038 mrcastel 20 0 54448 7184 4336 R 0.7 0.1 15:35.11 jobless --cpu 3 --timeout 120m
24039 mrcastel 20 0 54448 7184 4336 R 1.3 0.1 15:38.99 jobless --cpu 3 --timeout 120m
24137 mrcastel 20 0 53800 5808 5492 S 0.0 0.1 0:00.01 jobmore --cpu 3 --timeout 120m
24138 mrcastel 20 0 54444 7160 4308 R 32.9 0.1 18:15.75 jobmore --cpu 3 --timeout 120m
24139 mrcastel 20 0 54444 7160 4308 R 32.3 0.1 18:14.36 jobmore --cpu 3 --timeout 120m
24140 mrcastel 20 0 54444 7160 4308 R 32.3 0.1 18:15.22 jobmore --cpu 3 --timeout 120m
Do we need to have seperate cgroups created for Multitenancy. Assuming Multitenancy here is each tenant having its own namespace ?