Skip to content

Instantly share code, notes, and snippets.

@emidoots
Last active November 2, 2021 11:16
Show Gist options
  • Save emidoots/85e06781eb0d4d35beee12916aefac5f to your computer and use it in GitHub Desktop.
Save emidoots/85e06781eb0d4d35beee12916aefac5f to your computer and use it in GitHub Desktop.
cAdvisor's missing `container_restart_count` metric

Problem: cAdvisor's missing container_restart_count metric

cAdvisor doesn't monitor container restarts, but it does pass through / expose the Docker label container_restart_count to you.

Unfortunately, being a label, you cannot really monitor it. And it looks like this isn't something cAdvisor plans to support soon, as the issue has been closed.

Solution

I am not proud of this and hope cAdvisor will support this more easily in the future, but, it does work well. Here is a Prometheus rule you can use to define this metric.

Limitation: If the restart count exceeds 99,999, the behavior is undefined.

groups:
- name: cadvisor-restart-count.rules
  rules:
  # See https://gist.github.com/slimsag/85e06781eb0d4d35beee12916aefac5f
  - record: container_restart_count
    expr: |-
      (
      (count by (name)(container_spec_cpu_shares{container_label_restartcount=~"^$"}) * 0)
      or (count by (name)(container_spec_cpu_shares{container_label_restartcount=~".*1$"}) * 1)
      or (count by (name)(container_spec_cpu_shares{container_label_restartcount=~".*2$"}) * 2)
      or (count by (name)(container_spec_cpu_shares{container_label_restartcount=~".*3$"}) * 3)
      or (count by (name)(container_spec_cpu_shares{container_label_restartcount=~".*4$"}) * 4)
      or (count by (name)(container_spec_cpu_shares{container_label_restartcount=~".*5$"}) * 5)
      or (count by (name)(container_spec_cpu_shares{container_label_restartcount=~".*6$"}) * 6)
      or (count by (name)(container_spec_cpu_shares{container_label_restartcount=~".*7$"}) * 7)
      or (count by (name)(container_spec_cpu_shares{container_label_restartcount=~".*8$"}) * 8)
      or (count by (name)(container_spec_cpu_shares{container_label_restartcount=~".*9$"}) * 9)
      )
      +
      (
      (count by (name)(container_spec_cpu_shares{container_label_restartcount=~"^.$"}) * 0)
      or (count by (name)(container_spec_cpu_shares{container_label_restartcount=~"^1.$"}) * 10)
      or (count by (name)(container_spec_cpu_shares{container_label_restartcount=~"^2.$"}) * 20)
      or (count by (name)(container_spec_cpu_shares{container_label_restartcount=~"^3.$"}) * 30)
      or (count by (name)(container_spec_cpu_shares{container_label_restartcount=~"^4.$"}) * 40)
      or (count by (name)(container_spec_cpu_shares{container_label_restartcount=~"^5.$"}) * 50)
      or (count by (name)(container_spec_cpu_shares{container_label_restartcount=~"^6.$"}) * 60)
      or (count by (name)(container_spec_cpu_shares{container_label_restartcount=~"^7.$"}) * 70)
      or (count by (name)(container_spec_cpu_shares{container_label_restartcount=~"^8.$"}) * 80)
      or (count by (name)(container_spec_cpu_shares{container_label_restartcount=~"^9.$"}) * 90)
      )
      +
      (
      (count by (name)(container_spec_cpu_shares{container_label_restartcount=~"^.$"}) * 0)
      or (count by (name)(container_spec_cpu_shares{container_label_restartcount=~"^..$"}) * 0)
      or (count by (name)(container_spec_cpu_shares{container_label_restartcount=~"^1..$"}) * 100)
      or (count by (name)(container_spec_cpu_shares{container_label_restartcount=~"^2..$"}) * 200)
      or (count by (name)(container_spec_cpu_shares{container_label_restartcount=~"^3..$"}) * 300)
      or (count by (name)(container_spec_cpu_shares{container_label_restartcount=~"^4..$"}) * 400)
      or (count by (name)(container_spec_cpu_shares{container_label_restartcount=~"^5..$"}) * 500)
      or (count by (name)(container_spec_cpu_shares{container_label_restartcount=~"^6..$"}) * 600)
      or (count by (name)(container_spec_cpu_shares{container_label_restartcount=~"^7..$"}) * 700)
      or (count by (name)(container_spec_cpu_shares{container_label_restartcount=~"^8..$"}) * 800)
      or (count by (name)(container_spec_cpu_shares{container_label_restartcount=~"^9..$"}) * 900)
      )
      +
      (
      (count by (name)(container_spec_cpu_shares{container_label_restartcount=~"^.$"}) * 0)
      or (count by (name)(container_spec_cpu_shares{container_label_restartcount=~"^..$"}) * 0)
      or (count by (name)(container_spec_cpu_shares{container_label_restartcount=~"^...$"}) * 0)
      or (count by (name)(container_spec_cpu_shares{container_label_restartcount=~"^1...$"}) * 1000)
      or (count by (name)(container_spec_cpu_shares{container_label_restartcount=~"^2...$"}) * 2000)
      or (count by (name)(container_spec_cpu_shares{container_label_restartcount=~"^3...$"}) * 3000)
      or (count by (name)(container_spec_cpu_shares{container_label_restartcount=~"^4...$"}) * 4000)
      or (count by (name)(container_spec_cpu_shares{container_label_restartcount=~"^5...$"}) * 5000)
      or (count by (name)(container_spec_cpu_shares{container_label_restartcount=~"^6...$"}) * 6000)
      or (count by (name)(container_spec_cpu_shares{container_label_restartcount=~"^7...$"}) * 7000)
      or (count by (name)(container_spec_cpu_shares{container_label_restartcount=~"^8...$"}) * 8000)
      or (count by (name)(container_spec_cpu_shares{container_label_restartcount=~"^9...$"}) * 9000)
      )
      +
      (
      (count by (name)(container_spec_cpu_shares{container_label_restartcount=~"^.$"}) * 0)
      or (count by (name)(container_spec_cpu_shares{container_label_restartcount=~"^..$"}) * 0)
      or (count by (name)(container_spec_cpu_shares{container_label_restartcount=~"^...$"}) * 0)
      or (count by (name)(container_spec_cpu_shares{container_label_restartcount=~"^....$"}) * 0)
      or (count by (name)(container_spec_cpu_shares{container_label_restartcount=~"^1....$"}) * 10000)
      or (count by (name)(container_spec_cpu_shares{container_label_restartcount=~"^2....$"}) * 20000)
      or (count by (name)(container_spec_cpu_shares{container_label_restartcount=~"^3....$"}) * 30000)
      or (count by (name)(container_spec_cpu_shares{container_label_restartcount=~"^4....$"}) * 40000)
      or (count by (name)(container_spec_cpu_shares{container_label_restartcount=~"^5....$"}) * 50000)
      or (count by (name)(container_spec_cpu_shares{container_label_restartcount=~"^6....$"}) * 60000)
      or (count by (name)(container_spec_cpu_shares{container_label_restartcount=~"^7....$"}) * 70000)
      or (count by (name)(container_spec_cpu_shares{container_label_restartcount=~"^8....$"}) * 80000)
      or (count by (name)(container_spec_cpu_shares{container_label_restartcount=~"^9....$"}) * 90000)
      )

Explanation

The first section:

(
(count by (name)(container_spec_cpu_shares{container_label_restartcount=~"^$"}) * 0)
or (count by (name)(container_spec_cpu_shares{container_label_restartcount=~".*1$"}) * 1)
or (count by (name)(container_spec_cpu_shares{container_label_restartcount=~".*2$"}) * 2)
or (count by (name)(container_spec_cpu_shares{container_label_restartcount=~".*3$"}) * 3)
or (count by (name)(container_spec_cpu_shares{container_label_restartcount=~".*4$"}) * 4)
or (count by (name)(container_spec_cpu_shares{container_label_restartcount=~".*5$"}) * 5)
or (count by (name)(container_spec_cpu_shares{container_label_restartcount=~".*6$"}) * 6)
or (count by (name)(container_spec_cpu_shares{container_label_restartcount=~".*7$"}) * 7)
or (count by (name)(container_spec_cpu_shares{container_label_restartcount=~".*8$"}) * 8)
or (count by (name)(container_spec_cpu_shares{container_label_restartcount=~".*9$"}) * 9)
)

Extracts the last digit of the restartcount label. The first case ^$ is for when there is no restartcount label.

Then, we add the 2nd to last digit:

+
(
(count by (name)(container_spec_cpu_shares{container_label_restartcount=~"^.$"}) * 0)
or (count by (name)(container_spec_cpu_shares{container_label_restartcount=~"^1.$"}) * 10)
or (count by (name)(container_spec_cpu_shares{container_label_restartcount=~"^2.$"}) * 20)
or (count by (name)(container_spec_cpu_shares{container_label_restartcount=~"^3.$"}) * 30)
or (count by (name)(container_spec_cpu_shares{container_label_restartcount=~"^4.$"}) * 40)
or (count by (name)(container_spec_cpu_shares{container_label_restartcount=~"^5.$"}) * 50)
or (count by (name)(container_spec_cpu_shares{container_label_restartcount=~"^6.$"}) * 60)
or (count by (name)(container_spec_cpu_shares{container_label_restartcount=~"^7.$"}) * 70)
or (count by (name)(container_spec_cpu_shares{container_label_restartcount=~"^8.$"}) * 80)
or (count by (name)(container_spec_cpu_shares{container_label_restartcount=~"^9.$"}) * 90)
)

Similarly, the first ^.$ case handles the restartcount being a single-digit number.

This is repeated to handle restartcount digits in the range of 0-99,999

@genieai-vikas
Copy link

It's not working container_label_restartcount is not available

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment