Skip to content

Instantly share code, notes, and snippets.

@cmwylie19
Last active April 26, 2022 17:04
Show Gist options
  • Select an option

  • Save cmwylie19/8b90f7fa8a92028f142f8b51fb048fea to your computer and use it in GitHub Desktop.

Select an option

Save cmwylie19/8b90f7fa8a92028f142f8b51fb048fea to your computer and use it in GitHub Desktop.
Monitoring Stack Operator Demo

Monitoring Stack Operator Demo

We are going to deploy an app called Blue which has a healthcheck endpoint, metrics endpoint, and a RemoteWrite endpoint that consumes prometheus metrics. We will scrape metrics from the application, create alerts, a GrafanaDashboard, and consume Prometheus data through an endpoint.

Install MSO

Create the CatalogSource and Subscription

kubectl apply -f -<<EOF
apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  annotations:
  name: monitoring-operators
  namespace: openshift-marketplace
spec:
  displayName: Monitoring Test Operator
  icon:
    base64data: ""
    mediatype: ""
  image: quay.io/tsisodia10/monitoring-stack-operator-catalog:latest
  publisher: Twinkll Sisodia
  sourceType: grpc
  updateStrategy:
    registryPoll:
      interval: 1m0s
---
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  labels:
    operators.coreos.com/monitoring-stack-operator.openshift-operators: ""
  name: monitoring-stack-operator
  namespace: openshift-operators
spec:
  channel: development
  installPlanApproval: Automatic
  name: monitoring-stack-operator
  source: monitoring-operators
  sourceNamespace: openshift-marketplace
  startingCSV: monitoring-stack-operator.v0.0.2
EOF

Deploy Sample App

Deploy Blue app which produces metrics and consumes them through an Endpoint.

kubectl apply -f -<<EOF
apiVersion: v1
kind: Namespace
metadata:
  creationTimestamp: null
  name: blue
spec: {}
status: {}
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app: blue
    version: v1
  name: blue
  namespace: blue
spec:
  ports:
    - port: 9000
      name: http
  selector:
    app: blue
  type: ClusterIP
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: blue
    version: v1
  name: blue
  namespace: blue
spec:
  selector:
    matchLabels:
      app: blue
      version: v1
  replicas: 1
  template:
    metadata:
      labels:
        app: blue
        version: v1
    spec:
      serviceAccountName: blue
      containers:
        - image: docker.io/cmwylie19/metrics-demo
          name: blue
          resources:
            requests:
              memory: "64Mi"
              cpu: "250m"
            limits:
              memory: "128Mi"
              cpu: "500m"
          ports:
            - containerPort: 9000
              name: http
          imagePullPolicy: Always
      restartPolicy: Always
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: blue
  namespace: blue
---
apiVersion: route.openshift.io/v1
kind: Route
metadata:
  creationTimestamp: null
  labels:
    app: blue
    version: v1
  name: blue
  namespace: blue
spec:
  port:
    targetPort: http
  to:
    kind: ""
    name: blue
    weight: null
EOF

Deploy MSO

Create an instance of the MSO with RemoteWrite and ExternalLabels configured.

kubectl apply -f -<<EOF
apiVersion: monitoring.rhobs/v1alpha1
kind: MonitoringStack
metadata:
  name: blue
  namespace: blue
spec:
  logLevel: debug
  prometheusConfig:
    externalLabels:
      tenant_id: "blue-tenant"
      user: "blue-user"
      clusterID: "blue-cluster"
    remoteWrite:
    - url: http://$(kubectl get route blue -n blue -ojsonpath='{.status.ingress[0].host}')/alert # My app to read the metrics
  retention: 120h
EOF

Deploy ServiceMonitor

This ServiceMonitor tells the PrometheusOperator to scrape services will app=blue.

kubectl apply -f -<<EOF
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: blue
  namespace: blue
  labels:
    app: blue
spec:
  selector:
    matchLabels:
      app: blue
  endpoints:
  - port: http
EOF

Now, lets verify that prometheus is scraping the ServiceMonitor.

k logs prometheus-blue-0 -n blue --since 1m | grep namespaces/blue | grep 200 

Deploy PrometheusRule

In creating this PrometheusRule, we are defining 3 alerts, and one custom rule.

Alerts:

  • HighRequestPerMinute
  • MediumRequestPerMinute
  • LowRequestPerMinute

Rules:

  • blue_requests_per_minute
kubectl apply -f -<<EOF
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule 
metadata:
  labels:
    prometheus: blue
    role: alert-rules
  name: blue
  namespace: blue
spec:
  groups:
  - name: blue_recording_rules
    interval: 2s
    rules: 
    - record: blue_requests_per_minute
      expr: increase(http_requests_total{container="blue"}[1m])
  - name: blue_alert_rules
    rules: 
    - alert: HighRequestPerMinute
      expr: blue_requests_per_minute >= 20 
      labels:
        severity: page # or critical 
      annotations:
        summary: "high number of requests"
        description: "Check system to make sure blue pods are okay"
    - alert: MediumRequestPerMinute
      expr: blue_requests_per_minute >= 15 
      labels:
        severity: warn 
      annotations:
        summary: "medium number of requests"
        description: "medium load on blue ap" 
    - alert: LowRequestPerMinute
      expr: blue_requests_per_minute >= 1 
      labels:
        severity: acknowledged
      annotations:
        summary: "low number of requests"
        description: "Blue app is receiving traffic"    
EOF

Now, lets trigger alerts in our local prometheus instance, first we are going to port-foward the prometheus instance to port 9090

kubectl port-forward prometheus-blue-0 -n blue 9090

Open localhost:9090 in your browser

Now, trigger an alert by sending requests to the blue service

for z in $(seq 33); do curl $(kubectl get route -n blue blue -ojsonpath='{.spec.host}'); done

Prometheus RemoteWrite Endpoint

When we configured the instance of the MSO, we configured a Prometheus RemoteWrite Endpoint and ExternalLabels. The RemoteWrite is where Prometheus is going to send the metrics, the ExternalLabels is how Prometheus will label the outgoing metrics.

Lets look at the logs from the blue app to see the remote write in action.

kubectl logs deploy/blue -n blue --since=1m

output

scrape_series_added{clusterID="blue-cluster", container="blue", endpoint="http", instance="10.128.2.23:9000", job="blue", namespace="blue", pod="blue-9d89f8dd7-7fzzd", prometheus="blue/blue", prometheus_replica="prometheus-blue-0", service="blue", tenant_id="blue-tenant", user="blue-user"}
        Sample:  0.000000 1650976199961
blue_requests_per_minute{clusterID="blue-cluster", container="blue", endpoint="http", instance="10.128.2.23:9000", job="blue", metrics="custom", namespace="blue", path="/", pod="blue-9d89f8dd7-7fzzd", prometheus="blue/blue", prometheus_replica="prometheus-blue-0", service="blue", tenant_id="blue-tenant", user="blue-user"}
        Sample:  0.000000 1650976200492
blue_requests_per_minute{clusterID="blue-cluster", container="blue", endpoint="http", instance="10.128.2.23:9000", job="blue", metrics="custom", namespace="blue", path="/alert", pod="blue-9d89f8dd7-7fzzd", prometheus="blue/blue", prometheus_replica="prometheus-blue-0", service="blue", tenant_id="blue-tenant", user="blue-user"}
        Sample:  14.000000 1650976200492
blue_requests_per_minute{clusterID="blue-cluster", container="blue", endpoint="http", instance="10.128.2.23:9000", job="blue", metrics="custom", namespace="blue", path="/metrics", pod="blue-9d89f8dd7-7fzzd", prometheus="blue/blue", prometheus_replica="prometheus-blue-0", service="blue", tenant_id="blue-tenant", user="blue-user"}
        Sample:  2.000000 1650976200492
Handling Metrics

Now, lets pay close attention to the external labels that we added to the MSO.

kubectl get monitoringstack -n blue blue -ojsonpath='{.spec.prometheusConfig.externalLabels}' | jq 

output

{
  "clusterID": "blue-cluster",
  "tenant_id": "blue-tenant",
  "user": "blue-user"
}

Finally, lets look at the RemoteWrite metrics and grep for the ExternalLabels:

kubectl logs deploy/blue -n blue --since=1m | egrep 'clusterID|user|tenant_id'

Deploy GrafanaDashboard

Now, we will deploy a Grafana Dashboard to visualize the blue metrics.

kubectl apply -f -<<EOF
apiVersion: integreatly.org/v1alpha1
kind: GrafanaDashboard
metadata:
  name: blue
  namespace: blue
  labels:
    app.kubernetes.io/part-of: monitoring-stack-operator
spec:
  datasources:
  - datasourceName: prometheus-grafanadatasource
    inputName: middleware.yaml
  json: >
    {
    "annotations": {
        "list": [
        {
            "builtIn": 1,
            "datasource": "-- Grafana --",
            "enable": true,
            "hide": true,
            "iconColor": "rgba(0, 211, 255, 1)",
            "name": "Annotations & Alerts",
            "type": "dashboard"
        }
        ]
    },
    "editable": true,
    "gnetId": null,
    "graphTooltip": 1,
    "id": 2,
    "links": [],
    "panels": [
        {
        "aliasColors": {},
        "bars": false,
        "dashLength": 10,
        "dashes": false,
        "datasource": null,
        "fieldConfig": {
            "defaults": {},
            "overrides": []
        },
        "fill": 1,
        "fillGradient": 0,
        "gridPos": {
            "h": 8,
            "w": 12,
            "x": 0,
            "y": 0
        },
        "hiddenSeries": false,
        "id": 6,
        "legend": {
            "avg": false,
            "current": false,
            "max": false,
            "min": false,
            "show": true,
            "total": false,
            "values": false
        },
        "lines": true,
        "linewidth": 1,
        "nullPointMode": "null",
        "options": {
            "alertThreshold": true
        },
        "percentage": false,
        "pluginVersion": "7.5.15",
        "pointradius": 2,
        "points": false,
        "renderer": "flot",
        "seriesOverrides": [],
        "spaceLength": 10,
        "stack": false,
        "steppedLine": false,
        "targets": [
            {
            "exemplar": true,
            "expr": "increase(blue_requests_per_minute{container=\"blue\",path=\"/alert\"}[1m])",
            "interval": "",
            "legendFormat": "",
            "refId": "A"
            }
        ],
        "thresholds": [],
        "timeFrom": null,
        "timeRegions": [],
        "timeShift": null,
        "title": "Blue Requests /alert ",
        "tooltip": {
            "shared": true,
            "sort": 0,
            "value_type": "individual"
        },
        "type": "graph",
        "xaxis": {
            "buckets": null,
            "mode": "time",
            "name": null,
            "show": true,
            "values": []
        },
        "yaxes": [
            {
            "$$hashKey": "object:295",
            "44782hashKey": "object:74",
            "format": "short",
            "label": null,
            "logBase": 1,
            "max": null,
            "min": null,
            "show": true
            },
            {
            "$$hashKey": "object:296",
            "44782hashKey": "object:75",
            "format": "short",
            "label": null,
            "logBase": 1,
            "max": null,
            "min": null,
            "show": true
            }
        ],
        "yaxis": {
            "align": false,
            "alignLevel": null
        }
        },
        {
        "aliasColors": {},
        "bars": false,
        "dashLength": 10,
        "dashes": false,
        "datasource": null,
        "fieldConfig": {
            "defaults": {},
            "overrides": []
        },
        "fill": 1,
        "fillGradient": 0,
        "gridPos": {
            "h": 8,
            "w": 12,
            "x": 0,
            "y": 8
        },
        "hiddenSeries": false,
        "id": 2,
        "legend": {
            "avg": false,
            "current": false,
            "max": false,
            "min": false,
            "show": true,
            "total": false,
            "values": false
        },
        "lines": true,
        "linewidth": 1,
        "nullPointMode": "null",
        "options": {
            "alertThreshold": true
        },
        "percentage": false,
        "pluginVersion": "7.5.15",
        "pointradius": 2,
        "points": false,
        "renderer": "flot",
        "seriesOverrides": [],
        "spaceLength": 10,
        "stack": false,
        "steppedLine": false,
        "targets": [
            {
            "exemplar": true,
            "expr": "blue_requests_per_minute",
            "interval": "",
            "legendFormat": "",
            "refId": "A"
            }
        ],
        "thresholds": [],
        "timeFrom": null,
        "timeRegions": [],
        "timeShift": null,
        "title": "Blue Requests All Routes",
        "tooltip": {
            "shared": true,
            "sort": 0,
            "value_type": "individual"
        },
        "type": "graph",
        "xaxis": {
            "buckets": null,
            "mode": "time",
            "name": null,
            "show": true,
            "values": []
        },
        "yaxes": [
            {
            "$$hashKey": "object:219",
            "44782hashKey": "object:55",
            "format": "short",
            "label": null,
            "logBase": 1,
            "max": null,
            "min": null,
            "show": true
            },
            {
            "$$hashKey": "object:220",
            "44782hashKey": "object:56",
            "format": "short",
            "label": null,
            "logBase": 1,
            "max": null,
            "min": null,
            "show": true
            }
        ],
        "yaxis": {
            "align": false,
            "alignLevel": null
        }
        },
        {
        "aliasColors": {},
        "bars": true,
        "dashLength": 10,
        "dashes": false,
        "datasource": null,
        "fieldConfig": {
            "defaults": {},
            "overrides": []
        },
        "fill": 2,
        "fillGradient": 0,
        "gridPos": {
            "h": 8,
            "w": 12,
            "x": 0,
            "y": 16
        },
        "hiddenSeries": false,
        "id": 4,
        "legend": {
            "avg": false,
            "current": false,
            "max": false,
            "min": false,
            "show": false,
            "total": false,
            "values": false
        },
        "lines": true,
        "linewidth": 1,
        "nullPointMode": "null",
        "options": {
            "alertThreshold": true
        },
        "percentage": false,
        "pluginVersion": "7.5.15",
        "pointradius": 2,
        "points": false,
        "renderer": "flot",
        "seriesOverrides": [],
        "spaceLength": 10,
        "stack": false,
        "steppedLine": false,
        "targets": [
            {
            "exemplar": true,
            "expr": "sum(http_response_time_seconds_sum)",
            "format": "time_series",
            "interval": "",
            "legendFormat": "",
            "refId": "HTTP Response Time Seconds"
            }
        ],
        "thresholds": [],
        "timeFrom": null,
        "timeRegions": [],
        "timeShift": null,
        "title": "HTTP Response Time",
        "tooltip": {
            "shared": false,
            "sort": 0,
            "value_type": "individual"
        },
        "type": "graph",
        "xaxis": {
            "buckets": null,
            "mode": "histogram",
            "name": null,
            "show": true,
            "values": [
            "total"
            ]
        },
        "yaxes": [
            {
            "$$hashKey": "object:63",
            "44782hashKey": "object:61",
            "format": "short",
            "label": null,
            "logBase": 1,
            "max": null,
            "min": null,
            "show": true
            },
            {
            "$$hashKey": "object:64",
            "44782hashKey": "object:62",
            "format": "short",
            "label": null,
            "logBase": 1,
            "max": null,
            "min": null,
            "show": true
            }
        ],
        "yaxis": {
            "align": false,
            "alignLevel": null
        }
        }
    ],
    "refresh": "5s",
    "schemaVersion": 27,
    "style": "dark",
    "tags": [],
    "templating": {
        "list": []
    },
    "time": {
        "from": "now-6h",
        "to": "now"
    },
    "timepicker": {
        "refresh_intervals": [],
        "time_options": []
    },
    "timezone": "browser",
    "title": "Blue Dashboard",
    "uid": "dcdf4abf7e3f62fee760bc341fa70aa1074e414a",
    "version": 2
    }
EOF

Now we will port-forward the Grafana instance to 3000 and look at the dashboard.

kubectl port-forward svc/grafana-service -n monitoring-stack-operator 3000

Open localhost:3000 in your browser

Clean Up

# Delete GrafanaDashboard
kubectl delete grafanadashboard blue -n blue

# Delete PrometheusConfig
kubectl delete servicemonitor,prometheusrules blue -n blue

# Delete the MSO
kubectl delete monitoringstack blue -n blue

# Uninstall Blue
kubectl delete deploy,svc,route,sa blue -n blue

# Delete the blue namespace
kubectl delete ns blue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment