During this guide, we will deploy the blue app that produces metrics. We will scrape the metrics with Prometheus, define alerts, send alerts to pagerduty, stream metrics to a remote endpoints using RemoteWrite and create a custom Grafana Dashboard for our app.
- Deploy Blue App
- Install MonitoringStack
- Create Instance of MonitoringStack Operator
- Deploy Blue ServiceMonitor
- Setup Alerts
- Trigger alerts
- Send Alerts to Pagerduty
- Create Instance of Grafana Operator
- Create a GrafanaDataSource
- Create a GrafanaDashboard
- Read RemoteWrite Data Stream
- Clean Up
We use this app to scrap metrics and send remote writes.
kubectl apply -f -<<EOF
apiVersion: v1
kind: Service
metadata:
labels:
app: blue
version: v1
name: blue
namespace: default
spec:
ports:
- port: 9000
name: http
nodePort: 32062
selector:
app: blue
type: NodePort
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: blue
version: v1
name: blue
namespace: default
spec:
selector:
matchLabels:
app: blue
version: v1
replicas: 1
template:
metadata:
labels:
app: blue
version: v1
spec:
serviceAccountName: blue
containers:
- image: docker.io/cmwylie19/blue:openshift
name: blue
resources:
requests:
memory: "64Mi"
cpu: "250m"
limits:
memory: "128Mi"
cpu: "500m"
readinessProbe:
failureThreshold: 3
initialDelaySeconds: 10
successThreshold: 1
periodSeconds: 60
httpGet:
path: /
port: 9000
livenessProbe:
initialDelaySeconds: 10
periodSeconds: 60
httpGet:
path: /
port: 9000
ports:
- containerPort: 9000
name: http
imagePullPolicy: Always
restartPolicy: Always
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: blue
---
apiVersion: route.openshift.io/v1
kind: Route
metadata:
creationTimestamp: null
labels:
app: blue
version: v1
name: blue
spec:
port:
targetPort: http
to:
kind: ""
name: blue
weight: null
EOF
output
service/blue created
deployment.apps/blue created
serviceaccount/blue created
route.route.openshift.io/blue created
Wait for Blue
kubectl wait --for=condition=ready pod -l app=blue --timeout=120s
output
pod/blue-699b5f7f7d-npn7k condition met
Install the CatalogSource and the Subscription
kubectl apply -f -<<EOF
apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
annotations:
name: monitoring-operators
namespace: openshift-marketplace
spec:
displayName: Monitoring Test Operator
icon:
base64data: ""
mediatype: ""
image: quay.io/tsisodia10/monitoring-stack-operator-catalog:latest
publisher: Twinkll Sisodia
sourceType: grpc
updateStrategy:
registryPoll:
interval: 1m0s
---
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
creationTimestamp: "2022-04-01T13:18:49Z"
generation: 1
labels:
operators.coreos.com/monitoring-stack-operator.openshift-operators: ""
name: monitoring-stack-operator
namespace: openshift-operators
resourceVersion: "533988"
uid: 84897a32-d97a-41ee-921e-e1ce80e04bd8
spec:
channel: development
installPlanApproval: Automatic
name: monitoring-stack-operator
source: monitoring-operators
sourceNamespace: openshift-marketplace
startingCSV: monitoring-stack-operator.v0.0.1
EOF
output
catalogsource.operators.coreos.com/monitoring-operators created
subscription.operators.coreos.com/monitoring-stack-operator created
Wait for operator pods to become ready, it takes a few moments for the pods to come up in the namespace
kubectl wait --for=condition=ready pod -l app.kubernetes.io/name=monitoring-stack-operator -n openshift-operators --timeout=120s
output
pod/monitoring-stack-operator-54f8fb6f4f-s9jc9 condition met
Make SURE the URL gets interpreted correctly, no forward slash $(/http...), when I copied and pasted it added a forward slash. The value for your is the value of the blue route kubectl get route blue . Our application accepts remote writes to /alert. We are configuring prometheus to send metrics to our blue app.
kubectl apply -f -<<EOF
apiVersion: monitoring.rhobs/v1alpha1
kind: MonitoringStack
metadata:
name: starburst
spec:
logLevel: debug # debug
prometheusConfig:
remoteWrite:
- url: http://$(kubectl get routes blue -ojsonpath='{.status.ingress[0].host}')/alert
EOF
output
monitoringstack.monitoring.rhobs/starburst created
Tell prometheus to scrape metrics from the blue app
kubectl apply -f -<<EOF
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: blue-monitor
labels:
app: blue
spec:
selector:
matchLabels:
app: blue
endpoints:
- port: http
EOF
output
servicemonitor.monitoring.coreos.com/blue-monitor created
Blue app alerts contains a recording rule called blue_requests_per_minute, and an alert for LowLowBlue, MediumLoadBlue, HighLoadBlue.
kubectl apply -f -<<EOF
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
creationTimestamp: null
labels:
prometheus: blue
role: alert-rules
name: blue-rules
spec:
groups:
- name: recording_rules
interval: 2s
rules:
- record: blue_requests_per_minute
expr: increase(http_requests_total{container="blue"}[1m])
- name: LoadRules
rules:
- alert: HighLoadBlue
expr: blue_requests_per_minute >= 10
labels:
severity: page # or critical
annotations:
summary: "high load average"
description: "high load average"
- alert: MediumLoadBlue
expr: blue_requests_per_minute >= 5
labels:
severity: warn
annotations:
summary: "medium load average"
description: "medium load average"
- alert: LowLoadBlue
expr: blue_requests_per_minute >= 1
labels:
severity: acknowledged
annotations:
summary: "low load average"
description: "low load average"
EOF
To trigger the alerts in prometheus, we must issue requests the blue application:
- one or more times to trigger
LowLoadBlue - five or more times to trigger
MediumLoadBlue - ten or more times to trigger
HighLoadBlue
Lets create a service that we will use to call the blue service:
kubectl run curler --image=nginx:alpine --port=80 --expose
output
service/curler created
pod/curler created
Curl the blue app 15 times:
for z in $(seq 15); do kubectl exec -it pod/curler -- curl blue:9000/; done
output
OKOKOKOKOKOKOKOKOKOKOKOKOKOKOK
Now, check the alerts from the prometheus ui. Sometimes it can take a few moments depending on how fast you were. Make sure your targets have populated in the console from the serviceMonitor, wait if they are not up, then check alerts
kubectl port-forward prometheus-starburst-0 9090
open http://localhost:9090/alerts
Follow this guide and create a PagerDuty Service, add the Integration Key as the secret.
Pagerduty secret
kubectl create secret generic pagerduty-key --from-literal=secretKey=INTEGRATION_KEY
kubectl apply -f -<<EOF
apiVersion: monitoring.coreos.com/v1alpha1
kind: AlertmanagerConfig
metadata:
name: blue-pagerduty
labels:
alertmanagerConfig: starburst
spec:
route:
groupBy: [alertname,cluster,service,job]
groupWait: 30s
groupInterval: 2m
repeatInterval: 2m
receiver: 'pagerduty-instance'
routes:
- match:
severity: 'warn'
receiver: pagerduty-instance
- match:
severity: 'acknowledged'
receiver: pagerduty-instance
- match:
severity: 'page'
receiver: pagerduty-instance
receivers:
- name: 'pagerduty-instance'
pagerdutyConfigs:
- serviceKey:
key: secretKey
name: pagerduty-key
url: https://events.pagerduty.com/generic/2010-04-15/create_event.json
EOF
output
alertmanagerconfig.monitoring.coreos.com/blue-pagerduty created
Wait a few moments for the ingration to take place
sleep 120
Trigger a page
for z in $(seq 45); do kubectl exec -it pod/curler -- curl blue:9000/; done
output
OKOKOKOKOKOKOKOKOKOKOKOKOKOKOKOKOKOKOKOKOKOKOKOKOKOKOKOKOKOKOKOKOKOKOKOKOKOKOKOKOKOKOKOKOK
If your setup is correct, you should get a page after running the curl script a time or two.
There is no need, the monitoring-stack-operator has already created an instance.
check the grafana instance already deployed:
kubectl get grafana -n monitoring-stack-operator monitoring-stack-operator-grafana -oyaml
output
apiVersion: integreatly.org/v1alpha1
kind: Grafana
metadata:
creationTimestamp: "2022-04-02T11:25:11Z"
generation: 1
labels:
app.kubernetes.io/managed-by: monitoring-stack-operator
name: monitoring-stack-operator-grafana
namespace: monitoring-stack-operator
resourceVersion: "256896"
uid: 24989bc8-4c44-4bac-b545-9f3e388bfcfd
spec:
config:
auth:
disable_login_form: true
disable_signout_menu: true
auth.anonymous:
enabled: true
log:
level: info
mode: console
users:
viewers_can_edit: true
dashboardLabelSelector:
- matchLabels:
app.kubernetes.io/part-of: monitoring-stack-operator
deployment:
replicas: 1
strategy:
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
type: RollingUpdate
ingress:
enabled: true
path: /
pathType: Prefix
status:
message: success
phase: reconciling
previousServiceName: grafana-service
Notice .spec.dashboardLabelSelector.matchLabels.app.kubernetes.io/part-of=monitoring-stack-operator. We must label our dashboard with this label for it to get picked up.
The monitoring stack operator has already deployed a GrafanaDataSource.
kubectl get grafanadatasource -n monitoring-stack-operator ms-default-starburst -oyaml
output
apiVersion: integreatly.org/v1alpha1
kind: GrafanaDataSource
metadata:
annotations:
monitoring-stack-operator/owner-name: starburst
monitoring-stack-operator/owner-namespace: default
creationTimestamp: "2022-04-02T11:25:35Z"
generation: 1
name: ms-default-starburst
namespace: monitoring-stack-operator
resourceVersion: "249496"
uid: 81996313-6819-4adb-adc0-e02c3e30a2a0
spec:
datasources:
- access: proxy
jsonData:
tracesToLogs: {}
name: ms-default-starburst
secureJsonData: {}
type: prometheus
url: starburst-prometheus.default:9090
version: 1
name: ms-default-starburst
status:
message: ""
phase: ""
This is a custom dashboard built to showcase metrics from the blue app, configured to read from the monitoring stack. Make sure it is labeled as app.kubernetes.io/part-of: monitoring-stack-operator.
kubectl apply -f -<<EOF
apiVersion: integreatly.org/v1alpha1
kind: GrafanaDashboard
metadata:
name: blue-dashboard
namespace: monitoring-stack-operator
labels:
app.kubernetes.io/part-of: monitoring-stack-operator
spec:
datasources:
- datasourceName: prometheus-grafanadatasource
inputName: middleware.yaml
json: >
{
"annotations": {
"list": [
{
"builtIn": 1,
"datasource": "-- Grafana --",
"enable": true,
"hide": true,
"iconColor": "rgba(0, 211, 255, 1)",
"name": "Annotations & Alerts",
"type": "dashboard"
}
]
},
"editable": true,
"gnetId": null,
"graphTooltip": 1,
"id": 2,
"links": [],
"panels": [
{
"aliasColors": {},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": null,
"fieldConfig": {
"defaults": {},
"overrides": []
},
"fill": 1,
"fillGradient": 0,
"gridPos": {
"h": 8,
"w": 12,
"x": 0,
"y": 0
},
"hiddenSeries": false,
"id": 6,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"nullPointMode": "null",
"options": {
"alertThreshold": true
},
"percentage": false,
"pluginVersion": "7.5.15",
"pointradius": 2,
"points": false,
"renderer": "flot",
"seriesOverrides": [],
"spaceLength": 10,
"stack": false,
"steppedLine": false,
"targets": [
{
"exemplar": true,
"expr": "increase(blue_requests_per_minute{container=\"blue\",path=\"/alert\"}[1m])",
"interval": "",
"legendFormat": "",
"refId": "A"
}
],
"thresholds": [],
"timeFrom": null,
"timeRegions": [],
"timeShift": null,
"title": "Blue Requests /alert ",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": []
},
"yaxes": [
{
"$$hashKey": "object:295",
"44782hashKey": "object:74",
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"$$hashKey": "object:296",
"44782hashKey": "object:75",
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
],
"yaxis": {
"align": false,
"alignLevel": null
}
},
{
"aliasColors": {},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": null,
"fieldConfig": {
"defaults": {},
"overrides": []
},
"fill": 1,
"fillGradient": 0,
"gridPos": {
"h": 8,
"w": 12,
"x": 0,
"y": 8
},
"hiddenSeries": false,
"id": 2,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"nullPointMode": "null",
"options": {
"alertThreshold": true
},
"percentage": false,
"pluginVersion": "7.5.15",
"pointradius": 2,
"points": false,
"renderer": "flot",
"seriesOverrides": [],
"spaceLength": 10,
"stack": false,
"steppedLine": false,
"targets": [
{
"exemplar": true,
"expr": "blue_requests_per_minute",
"interval": "",
"legendFormat": "",
"refId": "A"
}
],
"thresholds": [],
"timeFrom": null,
"timeRegions": [],
"timeShift": null,
"title": "Blue Requests All Routes",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": []
},
"yaxes": [
{
"$$hashKey": "object:219",
"44782hashKey": "object:55",
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"$$hashKey": "object:220",
"44782hashKey": "object:56",
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
],
"yaxis": {
"align": false,
"alignLevel": null
}
},
{
"aliasColors": {},
"bars": true,
"dashLength": 10,
"dashes": false,
"datasource": null,
"fieldConfig": {
"defaults": {},
"overrides": []
},
"fill": 2,
"fillGradient": 0,
"gridPos": {
"h": 8,
"w": 12,
"x": 0,
"y": 16
},
"hiddenSeries": false,
"id": 4,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": false,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"nullPointMode": "null",
"options": {
"alertThreshold": true
},
"percentage": false,
"pluginVersion": "7.5.15",
"pointradius": 2,
"points": false,
"renderer": "flot",
"seriesOverrides": [],
"spaceLength": 10,
"stack": false,
"steppedLine": false,
"targets": [
{
"exemplar": true,
"expr": "sum(http_response_time_seconds_sum)",
"format": "time_series",
"interval": "",
"legendFormat": "",
"refId": "HTTP Response Time Seconds"
}
],
"thresholds": [],
"timeFrom": null,
"timeRegions": [],
"timeShift": null,
"title": "HTTP Response Time",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "histogram",
"name": null,
"show": true,
"values": [
"total"
]
},
"yaxes": [
{
"$$hashKey": "object:63",
"44782hashKey": "object:61",
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"$$hashKey": "object:64",
"44782hashKey": "object:62",
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
],
"yaxis": {
"align": false,
"alignLevel": null
}
}
],
"refresh": "5s",
"schemaVersion": 27,
"style": "dark",
"tags": [],
"templating": {
"list": []
},
"time": {
"from": "now-6h",
"to": "now"
},
"timepicker": {
"refresh_intervals": [],
"time_options": []
},
"timezone": "browser",
"title": "Blue Dashboard",
"uid": "dcdf4abf7e3f62fee760bc341fa70aa1074e414a",
"version": 2
}
EOF
output
grafanadashboard.integreatly.org/starburst-dashboard created
Now view the dashboard.
kubectl port-forward svc/grafana-service -n monitoring-stack-operator 3000
Visit localhost:3000 in your browser, look for Blue Dashboard
We configured our MonitoringStack Operator to have a prometheus RemoteWrite configuration
kubectl get monitoringstack starburst -ojsonpath='{.spec.prometheusConfig.remoteWrite}' | jq
output
[
{
"url": "http://blue-default.apps.kong-cwylie.fsi-env2.rhecoeng.com/alert"
}
]
This endpoint is an endpoint on our blue app that prometheus is streaming metrics. Lets look at the logs from that pod
kubectl logs deploy/blue --since=10s
output
blue_requests_per_minute{container="blue", endpoint="http", instance="10.131.1.100:9000", job="blue", metrics="custom", namespace="default", path="/", pod="blue-699b5f7f7d-9hb9q", prometheus="default/starburst", prometheus_replica="prometheus-starburst-0", service="blue"}
Sample: 4.000000 1648901011675
blue_requests_per_minute{container="blue", endpoint="http", instance="10.131.1.100:9000", job="blue", metrics="custom", namespace="default", path="/alert", pod="blue-699b5f7f7d-9hb9q", prometheus="default/starburst", prometheus_replica="prometheus-starburst-0", service="blue"}
Sample: 14.000000 1648901011675
blue_requests_per_minute{container="blue", endpoint="http", instance="10.131.1.100:9000", job="blue", metrics="custom", namespace="default", path="/metrics", pod="blue-699b5f7f7d-9hb9q", prometheus="default/starburst", prometheus_replica="prometheus-starburst-0", service="blue"}
Sample: 2.000000 1648901011675
blue_requests_per_minute{container="blue", endpoint="http", instance="10.131.1.100:9000", job="blue", metrics="custom", namespace="default", path="/", pod="blue-699b5f7f7d-9hb9q", prometheus="default/starburst", prometheus_replica="prometheus-starburst-0", service="blue"}
Sample: 4.000000 1648901013675
blue_requests_per_minute{container="blue", endpoint="http", instance="10.131.1.100:9000", job="blue", metrics="custom", namespace="default", path="/alert", pod="blue-699b5f7f7d-9hb9q", prometheus="default/starburst", prometheus_replica="prometheus-starburst-0", service="blue"}
Sample: 14.000000 1648901013675
blue_requests_per_minute{container="blue", endpoint="http", instance="10.131.1.100:9000", job="blue", metrics="custom", namespace="default", path="/metrics", pod="blue-699b5f7f7d-9hb9q", prometheus="default/starburst", prometheus_replica="prometheus-starburst-0", service="blue"}
Sample: 2.000000 1648901013675
blue_requests_per_minute{container="blue", endpoint="http", instance="10.131.1.100:9000", job="blue", metrics="custom", namespace="default", path="/", pod="blue-699b5f7f7d-9hb9q", prometheus="default/starburst", prometheus_replica="prometheus-starburst-0", service="blue"}
Sample: 4.000000 1648901015675
blue_requests_per_minute{container="blue", endpoint="http", instance="10.131.1.100:9000", job="blue", metrics="custom", namespace="default", path="/alert", pod="blue-699b5f7f7d-9hb9q", prometheus="default/starburst", prometheus_replica="prometheus-starburst-0", service="blue"}
Sample: 14.000000 1648901015675
blue_requests_per_minute{container="blue", endpoint="http", instance="10.131.1.100:9000", job="blue", metrics="custom", namespace="default", path="/metrics", pod="blue-699b5f7f7d-9hb9q", prometheus="default/starburst", prometheus_replica="prometheus-starburst-0", service="blue"}
Sample: 2.000000 1648901015675
blue_requests_per_minute{container="blue", endpoint="http", instance="10.131.1.100:9000", job="blue", metrics="custom", namespace="default", path="/", pod="blue-699b5f7f7d-9hb9q", prometheus="default/starburst", prometheus_replica="prometheus-starburst-0", service="blue"}
Sample: 4.000000 1648901017675
blue_requests_per_minute{container="blue", endpoint="http", instance="10.131.1.100:9000", job="blue", metrics="custom", namespace="default", path="/alert", pod="blue-699b5f7f7d-9hb9q", prometheus="default/starburst", prometheus_replica="prometheus-starburst-0", service="blue"}
Sample: 14.000000 1648901017675
blue_requests_per_minute{container="blue", endpoint="http", instance="10.131.1.100:9000", job="blue", metrics="custom", namespace="default", path="/metrics", pod="blue-699b5f7f7d-9hb9q", prometheus="default/starburst", prometheus_replica="prometheus-starburst-0", service="blue"}
Sample: 2.000000 1648901017675
blue_requests_per_minute{container="blue", endpoint="http", instance="10.131.1.100:9000", job="blue", metrics="custom", namespace="default", path="/", pod="blue-699b5f7f7d-9hb9q", prometheus="default/starburst", prometheus_replica="prometheus-starburst-0", service="blue"}
Sample: 4.000000 1648901019675
blue_requests_per_minute{container="blue", endpoint="http", instance="10.131.1.100:9000", job="blue", metrics="custom", namespace="default", path="/alert", pod="blue-699b5f7f7d-9hb9q", prometheus="default/starburst", prometheus_replica="prometheus-starburst-0", service="blue"}
Sample: 14.000000 1648901019675
blue_requests_per_minute{container="blue", endpoint="http", instance="10.131.1.100:9000", job="blue", metrics="custom", namespace="default", path="/metrics", pod="blue-699b5f7f7d-9hb9q", prometheus="default/starburst", prometheus_replica="prometheus-starburst-0", service="blue"}
Sample: 2.000000 1648901019675
kubectl delete svc,deploy,sa blue
kubectl delete prometheusrules --all
kubectl delete alertmanagerconfig --all
kubectl delete servicemonitors --all
kubectl delete monitoringstack starburst
kubectl delete subscription monitoring-stack-operator -n openshift-operators
kubectl delete csv -n monitoring-stack-operator grafana-operator.v4.1.0
kubectl delete grafanadashboard starburst-dashboard -n monitoring-stack-operator
kubectl delete subs -n openshift-operators monitoring-stack-operator
kubectl delete csv -n openshift-operators monitoring-stack-operator.v0.0.1
kubectl delete sub monitoring-stack-operator-grafana-operator -n monitoring-stack-operator
kubectl delete pods,pvc,svc,secrets,cm,routes,deploy --all --force --grace-period=0
kubectl delete pods,pvc,sa,svc,secrets,cm,routes,sts,deploy --all --force --grace-period=0 -n monitoring-stack-operator