Skip to content

Instantly share code, notes, and snippets.

@tommeramber
Last active June 11, 2025 07:48
Show Gist options
  • Save tommeramber/b9208bfe558f8119bd897c63a599ef9c to your computer and use it in GitHub Desktop.
Save tommeramber/b9208bfe558f8119bd897c63a599ef9c to your computer and use it in GitHub Desktop.
---
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: pvc-over-used
namespace: openshift-monitoring
spec:
groups:
- name: cron-job-monitoring
rules:
- alert: PVC-OVER-USED
annotations:
summary: ’PVC {{ $labels.persistentvolumeclaim }} has {{ $value }} of Usage’
expr: >-
max(round(100 * (kubelet_volume_stats_used_bytes{namespace=~"openshift-monitoring",persistentvolumeclaim=~".*"}/(kubelet_volume_stats_capacity_bytes{namespace=~"openshift-monitoring",persistentvolumeclaim=~".*"}))) > 80) by(persistentvolumeclaim,namespace)
for: 10s
labels:
severity: custom
---
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: node-health-check
namespace: openshift-monitoring
spec:
groups:
- name: sanity-apps
rules:
- alert: NodeHealthJobCompletion
annotations:
description: >-
Job {{ $labels.namespace }}/{{ $labels.job_name }} is taking more than 15 minutes to complete, please drain and reboot the node.
summary: node-health-check job did not complete in time
expr: >
kube_job_spec_completions{namespace=~"(openshift-monitoring)",job_name=~"node-health-check.*"}
-
kube_job_status_succeeded{namespace=~"(openshift-monitoring)",job_name=~"node-health-check.*"}
> 0
for: 15m
labels:
severity: custom
---
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: detect-nfs-stale
namespace: openshift-monitoring
spec:
groups:
- name: detect-nfs-stale-apps
rules:
- alert: NFSDetectJobFailed
annotations:
description: >-
Job {{ $labels.namespace }}/{{ $labels.job_name }} is taking more than 15 minutes to complete, please drain and reboot the node.
summary: detect-stale-nfs job did not complete in time
expr: >
kube_job_failed{job="kube-state-metrics",job_name=~"(detect-stale-nfs.+)",namespace=~"(openshift-monitoring)"} > 0
for: 1m
labels:
severity: custom
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment