We are going to deploy an app called Blue which has a healthcheck endpoint, metrics endpoint, and a RemoteWrite endpoint that consumes prometheus metrics. We will scrape metrics from the application, create alerts, a GrafanaDashboard, and consume Prometheus data through an endpoint.
- Install MSO
- Deploy Sample App
- Deploy MSO
- Deploy ServiceMonitor
- Deploy PrometheusRule
- Prometheus RemoteWrite Endpoints
- Deploy GrafanaDashboard
- Clean Up
Create the CatalogSource and Subscription
kubectl apply -f -<<EOF
apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
annotations:
name: monitoring-operators
namespace: openshift-marketplace
spec:
displayName: Monitoring Test Operator
icon:
base64data: ""
mediatype: ""
image: quay.io/tsisodia10/monitoring-stack-operator-catalog:latest
publisher: Twinkll Sisodia
sourceType: grpc
updateStrategy:
registryPoll:
interval: 1m0s
---
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
labels:
operators.coreos.com/monitoring-stack-operator.openshift-operators: ""
name: monitoring-stack-operator
namespace: openshift-operators
spec:
channel: development
installPlanApproval: Automatic
name: monitoring-stack-operator
source: monitoring-operators
sourceNamespace: openshift-marketplace
startingCSV: monitoring-stack-operator.v0.0.2
EOF
Deploy Blue app which produces metrics and consumes them through an Endpoint.
kubectl apply -f -<<EOF
apiVersion: v1
kind: Namespace
metadata:
creationTimestamp: null
name: blue
spec: {}
status: {}
---
apiVersion: v1
kind: Service
metadata:
labels:
app: blue
version: v1
name: blue
namespace: blue
spec:
ports:
- port: 9000
name: http
selector:
app: blue
type: ClusterIP
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: blue
version: v1
name: blue
namespace: blue
spec:
selector:
matchLabels:
app: blue
version: v1
replicas: 1
template:
metadata:
labels:
app: blue
version: v1
spec:
serviceAccountName: blue
containers:
- image: docker.io/cmwylie19/metrics-demo
name: blue
resources:
requests:
memory: "64Mi"
cpu: "250m"
limits:
memory: "128Mi"
cpu: "500m"
ports:
- containerPort: 9000
name: http
imagePullPolicy: Always
restartPolicy: Always
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: blue
namespace: blue
---
apiVersion: route.openshift.io/v1
kind: Route
metadata:
creationTimestamp: null
labels:
app: blue
version: v1
name: blue
namespace: blue
spec:
port:
targetPort: http
to:
kind: ""
name: blue
weight: null
EOF
Create an instance of the MSO with RemoteWrite and ExternalLabels configured.
kubectl apply -f -<<EOF
apiVersion: monitoring.rhobs/v1alpha1
kind: MonitoringStack
metadata:
name: blue
namespace: blue
spec:
logLevel: debug
prometheusConfig:
externalLabels:
tenant_id: "blue-tenant"
user: "blue-user"
clusterID: "blue-cluster"
remoteWrite:
- url: http://$(kubectl get route blue -n blue -ojsonpath='{.status.ingress[0].host}')/alert # My app to read the metrics
retention: 120h
EOF
This ServiceMonitor tells the PrometheusOperator to scrape services will app=blue.
kubectl apply -f -<<EOF
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: blue
namespace: blue
labels:
app: blue
spec:
selector:
matchLabels:
app: blue
endpoints:
- port: http
EOF
Now, lets verify that prometheus is scraping the ServiceMonitor.
k logs prometheus-blue-0 -n blue --since 1m | grep namespaces/blue | grep 200
In creating this PrometheusRule, we are defining 3 alerts, and one custom rule.
Alerts:
- HighRequestPerMinute
- MediumRequestPerMinute
- LowRequestPerMinute
Rules:
- blue_requests_per_minute
kubectl apply -f -<<EOF
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
labels:
prometheus: blue
role: alert-rules
name: blue
namespace: blue
spec:
groups:
- name: blue_recording_rules
interval: 2s
rules:
- record: blue_requests_per_minute
expr: increase(http_requests_total{container="blue"}[1m])
- name: blue_alert_rules
rules:
- alert: HighRequestPerMinute
expr: blue_requests_per_minute >= 20
labels:
severity: page # or critical
annotations:
summary: "high number of requests"
description: "Check system to make sure blue pods are okay"
- alert: MediumRequestPerMinute
expr: blue_requests_per_minute >= 15
labels:
severity: warn
annotations:
summary: "medium number of requests"
description: "medium load on blue ap"
- alert: LowRequestPerMinute
expr: blue_requests_per_minute >= 1
labels:
severity: acknowledged
annotations:
summary: "low number of requests"
description: "Blue app is receiving traffic"
EOF
Now, lets trigger alerts in our local prometheus instance, first we are going to port-foward the prometheus instance to port 9090
kubectl port-forward prometheus-blue-0 -n blue 9090
Open localhost:9090 in your browser
Now, trigger an alert by sending requests to the blue service
for z in $(seq 33); do curl $(kubectl get route -n blue blue -ojsonpath='{.spec.host}'); done
When we configured the instance of the MSO, we configured a Prometheus RemoteWrite Endpoint and ExternalLabels. The RemoteWrite is where Prometheus is going to send the metrics, the ExternalLabels is how Prometheus will label the outgoing metrics.
Lets look at the logs from the blue app to see the remote write in action.
kubectl logs deploy/blue -n blue --since=1m
output
scrape_series_added{clusterID="blue-cluster", container="blue", endpoint="http", instance="10.128.2.23:9000", job="blue", namespace="blue", pod="blue-9d89f8dd7-7fzzd", prometheus="blue/blue", prometheus_replica="prometheus-blue-0", service="blue", tenant_id="blue-tenant", user="blue-user"}
Sample: 0.000000 1650976199961
blue_requests_per_minute{clusterID="blue-cluster", container="blue", endpoint="http", instance="10.128.2.23:9000", job="blue", metrics="custom", namespace="blue", path="/", pod="blue-9d89f8dd7-7fzzd", prometheus="blue/blue", prometheus_replica="prometheus-blue-0", service="blue", tenant_id="blue-tenant", user="blue-user"}
Sample: 0.000000 1650976200492
blue_requests_per_minute{clusterID="blue-cluster", container="blue", endpoint="http", instance="10.128.2.23:9000", job="blue", metrics="custom", namespace="blue", path="/alert", pod="blue-9d89f8dd7-7fzzd", prometheus="blue/blue", prometheus_replica="prometheus-blue-0", service="blue", tenant_id="blue-tenant", user="blue-user"}
Sample: 14.000000 1650976200492
blue_requests_per_minute{clusterID="blue-cluster", container="blue", endpoint="http", instance="10.128.2.23:9000", job="blue", metrics="custom", namespace="blue", path="/metrics", pod="blue-9d89f8dd7-7fzzd", prometheus="blue/blue", prometheus_replica="prometheus-blue-0", service="blue", tenant_id="blue-tenant", user="blue-user"}
Sample: 2.000000 1650976200492
Handling Metrics
Now, lets pay close attention to the external labels that we added to the MSO.
kubectl get monitoringstack -n blue blue -ojsonpath='{.spec.prometheusConfig.externalLabels}' | jq
output
{
"clusterID": "blue-cluster",
"tenant_id": "blue-tenant",
"user": "blue-user"
}
Finally, lets look at the RemoteWrite metrics and grep for the ExternalLabels:
kubectl logs deploy/blue -n blue --since=1m | egrep 'clusterID|user|tenant_id'
Now, we will deploy a Grafana Dashboard to visualize the blue metrics.
kubectl apply -f -<<EOF
apiVersion: integreatly.org/v1alpha1
kind: GrafanaDashboard
metadata:
name: blue
namespace: blue
labels:
app.kubernetes.io/part-of: monitoring-stack-operator
spec:
datasources:
- datasourceName: prometheus-grafanadatasource
inputName: middleware.yaml
json: >
{
"annotations": {
"list": [
{
"builtIn": 1,
"datasource": "-- Grafana --",
"enable": true,
"hide": true,
"iconColor": "rgba(0, 211, 255, 1)",
"name": "Annotations & Alerts",
"type": "dashboard"
}
]
},
"editable": true,
"gnetId": null,
"graphTooltip": 1,
"id": 2,
"links": [],
"panels": [
{
"aliasColors": {},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": null,
"fieldConfig": {
"defaults": {},
"overrides": []
},
"fill": 1,
"fillGradient": 0,
"gridPos": {
"h": 8,
"w": 12,
"x": 0,
"y": 0
},
"hiddenSeries": false,
"id": 6,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"nullPointMode": "null",
"options": {
"alertThreshold": true
},
"percentage": false,
"pluginVersion": "7.5.15",
"pointradius": 2,
"points": false,
"renderer": "flot",
"seriesOverrides": [],
"spaceLength": 10,
"stack": false,
"steppedLine": false,
"targets": [
{
"exemplar": true,
"expr": "increase(blue_requests_per_minute{container=\"blue\",path=\"/alert\"}[1m])",
"interval": "",
"legendFormat": "",
"refId": "A"
}
],
"thresholds": [],
"timeFrom": null,
"timeRegions": [],
"timeShift": null,
"title": "Blue Requests /alert ",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": []
},
"yaxes": [
{
"$$hashKey": "object:295",
"44782hashKey": "object:74",
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"$$hashKey": "object:296",
"44782hashKey": "object:75",
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
],
"yaxis": {
"align": false,
"alignLevel": null
}
},
{
"aliasColors": {},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": null,
"fieldConfig": {
"defaults": {},
"overrides": []
},
"fill": 1,
"fillGradient": 0,
"gridPos": {
"h": 8,
"w": 12,
"x": 0,
"y": 8
},
"hiddenSeries": false,
"id": 2,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"nullPointMode": "null",
"options": {
"alertThreshold": true
},
"percentage": false,
"pluginVersion": "7.5.15",
"pointradius": 2,
"points": false,
"renderer": "flot",
"seriesOverrides": [],
"spaceLength": 10,
"stack": false,
"steppedLine": false,
"targets": [
{
"exemplar": true,
"expr": "blue_requests_per_minute",
"interval": "",
"legendFormat": "",
"refId": "A"
}
],
"thresholds": [],
"timeFrom": null,
"timeRegions": [],
"timeShift": null,
"title": "Blue Requests All Routes",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": []
},
"yaxes": [
{
"$$hashKey": "object:219",
"44782hashKey": "object:55",
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"$$hashKey": "object:220",
"44782hashKey": "object:56",
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
],
"yaxis": {
"align": false,
"alignLevel": null
}
},
{
"aliasColors": {},
"bars": true,
"dashLength": 10,
"dashes": false,
"datasource": null,
"fieldConfig": {
"defaults": {},
"overrides": []
},
"fill": 2,
"fillGradient": 0,
"gridPos": {
"h": 8,
"w": 12,
"x": 0,
"y": 16
},
"hiddenSeries": false,
"id": 4,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": false,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"nullPointMode": "null",
"options": {
"alertThreshold": true
},
"percentage": false,
"pluginVersion": "7.5.15",
"pointradius": 2,
"points": false,
"renderer": "flot",
"seriesOverrides": [],
"spaceLength": 10,
"stack": false,
"steppedLine": false,
"targets": [
{
"exemplar": true,
"expr": "sum(http_response_time_seconds_sum)",
"format": "time_series",
"interval": "",
"legendFormat": "",
"refId": "HTTP Response Time Seconds"
}
],
"thresholds": [],
"timeFrom": null,
"timeRegions": [],
"timeShift": null,
"title": "HTTP Response Time",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "histogram",
"name": null,
"show": true,
"values": [
"total"
]
},
"yaxes": [
{
"$$hashKey": "object:63",
"44782hashKey": "object:61",
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"$$hashKey": "object:64",
"44782hashKey": "object:62",
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
],
"yaxis": {
"align": false,
"alignLevel": null
}
}
],
"refresh": "5s",
"schemaVersion": 27,
"style": "dark",
"tags": [],
"templating": {
"list": []
},
"time": {
"from": "now-6h",
"to": "now"
},
"timepicker": {
"refresh_intervals": [],
"time_options": []
},
"timezone": "browser",
"title": "Blue Dashboard",
"uid": "dcdf4abf7e3f62fee760bc341fa70aa1074e414a",
"version": 2
}
EOF
Now we will port-forward the Grafana instance to 3000 and look at the dashboard.
kubectl port-forward svc/grafana-service -n monitoring-stack-operator 3000
Open localhost:3000 in your browser
# Delete GrafanaDashboard
kubectl delete grafanadashboard blue -n blue
# Delete PrometheusConfig
kubectl delete servicemonitor,prometheusrules blue -n blue
# Delete the MSO
kubectl delete monitoringstack blue -n blue
# Uninstall Blue
kubectl delete deploy,svc,route,sa blue -n blue
# Delete the blue namespace
kubectl delete ns blue