Skip to content

Instantly share code, notes, and snippets.

@david-bc
Created July 27, 2018 14:54
Show Gist options
  • Save david-bc/f073b2ed3e2c287e4abceb073bcdc48e to your computer and use it in GitHub Desktop.
Save david-bc/f073b2ed3e2c287e4abceb073bcdc48e to your computer and use it in GitHub Desktop.
# Note: we add the environment label to all of our metrics, but if you don’t have a similar label, you’ll need to remove all references to it
- name: kafka_sla_rules
rules:
- alert: kafka_sla_produce_availability
expr: kafka_monitor_produce_service_produce_availability_avg_1m{environment!="loadtest"} < 1
for: 5m
labels:
severity: page
annotations:
description: '{{ $labels.environment }} kafka produce availability is {{ $value }}'
summary: '{{ $labels.environment }} kafka produce availability is < 1 for more than 5 minutes'
- alert: kafka_sla_consume_availability
expr: kafka_monitor_consume_service_consume_availability_avg_1m{environment!="loadtest"} < 1
for: 5m
labels:
severity: page
annotations:
description: '{{ $labels.environment }} kafka consume availability is {{ $value }}'
summary: '{{ $labels.environment }} kafka consume availability is < 1 for more than 5 minutes'
- alert: kafka_under_replicated_partitions
expr: sum(kafka_server_replicamanager_value{name="UnderReplicatedPartitions",environment!="loadtest"}) by(environment) > 0
for: 5m
labels:
severity: page
annotations:
description: '{{ $labels.environment }} kafka is reporting {{ $value }}
under-replicated partitions.'
summary: '{{ $labels.environment }} kafka under-replicated partitions'
- alert: kafka_offline_partitions
expr: sum(kafka_controller_kafkacontroller_value{name="OfflinePartitionsCount",environment!="loadtest"}) by(environment) > 0
for: 5m
labels:
severity: page
annotations:
description: '{{ $labels.environment }} kafka is reporting {{ $value }} offline
partitions. This will prevent anything from producing or consuming from that partition.'
summary: '{{ $labels.environment }} kafka offline partitions'
- alert: kafka_no_active_controller
expr: sum(kafka_controller_kafkacontroller_value{job="kafka", name="ActiveControllerCount",environment!="loadtest"}) by(environment) != 1
for: 5m
labels:
severity: page
annotations:
description: '{{ $labels.environment }} kafka is reporting no active controllers. Restarting one broker may
resolve this.'
summary: '{{ $labels.environment }} kafka has no active controller'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment