Skip to content

Instantly share code, notes, and snippets.

@dejanu
Last active November 14, 2024 10:18
Show Gist options
  • Save dejanu/46a00c6a6e8f84a57a5ed4f20c9ac4a1 to your computer and use it in GitHub Desktop.
Save dejanu/46a00c6a6e8f84a57a5ed4f20c9ac4a1 to your computer and use it in GitHub Desktop.
Kubernetes metrics to watch for capacity management

Nice and eazy Metrics

  • Information about Node:
# Gauge type metric

# CPU capacity <cores>
kube_node_status_capacity{resource="cpu"}

# Memory capacity <bytes>
kube_node_status_capacity{resource="memory"}
  • Information about Pod:
# Gauge type metric:
kube_pod_container_info
  • The number of requests/limits resource by a container:
# Gauge type metric:
kube_pod_container_resource_requests
kube_pod_container_resource_limits
  • Current Working Set(set of memory pages touched recently by the threads in the process) in bytes, aka OOM killer:
# Gauge type metric:
container_memory_working_set_bytes
  • Cumultaive CPU time (user time +system time) consumed in seconds:
# Counter type metric:
container_cpu_usage_seconds_total

PromQL query

  • Gauge/stat for no of pods in each namespace:
sum(kube_pod_info) by (namespace)
  • Containers without Memory/CPU limits per namespace:
# without CPU limts
sum by (namespace)(count by (namespace,pod,container)(kube_pod_container_info{container!=""}) unless sum by (namespace,pod,container)(kube_pod_container_resource_limits{resource="CPU"}))

# without Memory limits
sum by (namespace)(count by (namespace,pod,container)(kube_pod_container_info{container!=""}) unless sum by (namespace,pod,container)(kube_pod_container_resource_limits{resource="memory"}))
  • Containers whose CPU is are close to limits:
(sum by (namespace,pod,container)(rate(container_cpu_usage_seconds_total{container!=""}[5m])) / sum by (namespace,pod,container)(kube_pod_container_resource_limits{resource="cpu"})) > 0.8
  • Containers whose Memory usage is close to limits:
(sum by (namespace,pod,container)(container_memory_usage_bytes{container!=""}) / sum by (namespace,pod,container)(kube_pod_container_resource_limits{resource="memory"})) > 0.8
  • Top 10 containers without limits using CPU:
topk(10,sum by (namespace,pod,container)(rate(container_cpu_usage_seconds_total{container!=""}[5m])) unless sum by (namespace,pod,container)(kube_pod_container_resource_limits{resource="cpu"}))
  • Top 10 containers without limits using Memory:
topk(10,sum by (namespace,pod,container)(container_memory_usage_bytes{container!=""}) unless sum by (namespace,pod,container)(kube_pod_container_resource_limits{resource="memory"}))
  • Memory requests and limits:
# pod == label_values(pod)
kube_pod_container_resource_requests{pod="$pod", resource="memory"}
kube_pod_container_resource_limits{pod="$pod", resource="memory"}
  • Memory usage:
# pod == label_values(pod)
container_memory_working_set_bytes{name!~"POD",pod="$pod"}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment