LFX 2025 Term 1 - Headlamp Plugin for KEDA

Project Synopsis

The Headlamp KEDA Plugin is an innovative solution designed to simplify the management and monitoring of KEDA resources in Kubernetes. KEDA, the Kubernetes-based Event Driven Autoscaler, excels at scaling workloads based on real-time events and supports multiple scaling triggers. However, administrators and developers often face challenges such as limited visibility into ScaledObjects and ScaledJobs, difficulty monitoring real-time scaling metrics, and a fragmented view of KEDA’s performance alongside other Kubernetes resources.

This project aims to bridge these gaps by integrating a comprehensive plugin into Headlamp. Key features include:

Resource Management: An intuitive interface for listing, viewing, editing, and managing ScaledObjects and ScaledJobs. This includes a detailed view with an inline YAML editor and creation wizard.
Real-Time Monitoring: Integration of real-time metrics and trigger states using live-updating graphs and WebSocket-driven updates, enabling users to quickly identify scaling events.
Event Visualization: A timeline and event log panel for correlating scaling actions with system events, coupled with historical activity graphs to facilitate thorough troubleshooting.
Relationship Mapping: Headlamp's Map View depicting the relationships between KEDA resources, target workloads, and event sources for enhanced navigation and system insight.

By unifying these capabilities into a single, user-friendly dashboard, the Headlamp KEDA Plugin will empower Kubernetes teams to efficiently manage event-driven scaling, reduce operational complexity, and improve overall system reliability.

Timeline and Implementation Plans

Week 1-2:

Getting familiar with various scaling triggers and YAML specifications of ScaledObjects and ScaledJobs referring the official KEDA website at keda.sh.

Initializing a KEDA setup by deploying keda-admission, keda-metrics-apiserver and keda-operator controllers for future integration with Headlamp and getting multiple demo KEDA resources up and running covering various scenarios of triggers and configurations.

An example of a RabbitMQ based KEDA ScaledObject that I will be using for testing purposes:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: rabbitmq-consumer-scaledobject
  namespace: default
spec:
  scaleTargetRef:
    apiVersion: apps/v1                    # Optional. Default: apps/v1
    kind: Deployment                       # Optional. Default: Deployment
    name: rabbitmq-consumer                # Mandatory. Must be in the same namespace as the ScaledObject
  pollingInterval: 5                       # Check queue length every 5 seconds
  cooldownPeriod: 60                       # Wait 60 seconds before scaling down to avoid rapid scaling
  minReplicaCount: 0                       # Scale to zero when queue is empty
  maxReplicaCount: 5                       # Don't exceed 5 replicas (based on our processing capacity)
  fallback:                                # Fallback ensures minimum service if scaling fails
    failureThreshold: 3                    # Try 3 times before falling back
    replicas: 1                            # Run one replica in fallback mode
  advanced:                                # Advanced scaling behavior configuration
    restoreToOriginalReplicaCount: true    # Return to original count after scaling event
    horizontalPodAutoscalerConfig:         # Fine-tune HPA behavior
      name: keda-hpa-rabbitmq-consumer     # Custom HPA name for better identification
      behavior:                            # Configure scaling behavior
        scaleDown:
          stabilizationWindowSeconds: 600  # Wait 10 minutes before scaling down
          policies:
          - type: Percent
            value: 100
            periodSeconds: 15
  triggers:
  - type: rabbitmq
    metadata:
      # Required connection settings
      host: amqp://guest:[email protected]:5672  # RabbitMQ connection string
      queueName: orders                               # Queue to monitor
      mode: QueueLength                               # Scale based on number of messages
      value: "50"                                     # Target 50 messages per pod

      # Optional RabbitMQ-specific settings
      protocol: amqp                                  # Protocol for connection
      vhost: "/"                                      # Virtual host in RabbitMQ

A similar example of a ScaledJob KEDA resource will be deployed for testing purposes.

Integrating backend and frontend to display list views of these KEDA resources with filtering and sorting capabilities taking inspiration from the below output.

$ kubectl get scaledobjects

NAME                           SCALETARGETKIND      SCALETARGETNAME     MIN   MAX   READY   ACTIVE   FALLBACK   PAUSED   TRIGGERS   AGE
rabbitmq-consumer-scaledobject apps/v1.Deployment   rabbitmq-consumer   0     5     True    True     False     Unknown   rabbitmq   15m

Week 3-4:

Implementating detailed views of individual KEDA resources and showing configurations and relationships taking inspiration from the below output:

$ kubectl get scaledobject rabbitmq-consumer-scaledobject -o yaml

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"keda.sh/v1alpha1","kind":"ScaledObject","metadata":{"annotations":{},"name":"rabbitmq-consumer-scaledobject","namespace":"default"},"spec":{"advanced":{"horizontalPodAutoscalerConfig":{"behavior":{"scaleDown":{"policies":[{"periodSeconds":15,"type":"Percent","value":100}],"stabilizationWindowSeconds":600}},"name":"keda-hpa-rabbitmq-consumer"},"restoreToOriginalReplicaCount":true},"cooldownPeriod":60,"fallback":{"failureThreshold":3,"replicas":1},"maxReplicaCount":5,"minReplicaCount":0,"pollingInterval":5,"scaleTargetRef":{"apiVersion":"apps/v1","kind":"Deployment","name":"rabbitmq-consumer"},"triggers":[{"metadata":{"host":"amqp://guest:[email protected]:5672","mode":"QueueLength","queueName":"orders","value":"50"},"type":"rabbitmq"}]}}
  creationTimestamp: "2025-02-12T16:17:29Z"
  finalizers:
  - finalizer.keda.sh
  generation: 2
  labels:
    scaledobject.keda.sh/name: rabbitmq-consumer-scaledobject
  name: rabbitmq-consumer-scaledobject
  namespace: default
  resourceVersion: "3604"
  uid: f2f1b81f-ce3f-4a48-9a8e-81bb82c47263
spec:
  # [Previous spec section remains the same as above]
status:
  conditions:
  - message: ScaledObject is defined correctly and is ready for scaling
    reason: ScaledObjectReady
    status: "True"
    type: Ready
  - message: Queue length (267) is above target value (50), scaling is active
    reason: ScalerActive
    status: "True"
    type: Active
  - status: "False"
    type: Fallback
  - status: "False"
    type: Paused
  externalMetricNames:
  - s0-rabbitmq-orders-length
  hpaName: keda-hpa-rabbitmq-consumer
  originalReplicaCount: 1
  scaleTargetGVKR:
    group: apps
    kind: Deployment
    resource: deployments
    version: v1
  scaleTargetKind: apps/v1.Deployment
  triggersTypes: rabbitmq

As we can see, currently the status conditions show that the scaler is active. When the scaling trigger/condition is not met, this status conditon changes to a not active state and when KEDA goes in the cooldown period, this condition changes to cooldown state. We can use this knowledge further down the weeks to enahnce the real- time updates functionality on the Headlamp UI.

Integrating YAML editor for direct modification of KEDA resources in the detailed view taking inspiration from the previous point's YAML output.
Implementing basic CRUD operations for KEDA resources and completing the integration with Headlamp's navigation and layout.

Week 5-6:

Setup collecting real-time metrics for visualization of active scaling triggers using the below API:

$ kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1/namespaces/default/" | jq .

{
  "kind": "ExternalMetricValueList",
  "apiVersion": "external.metrics.k8s.io/v1beta1",
  "metadata": {},
  "items": [
    {
      "metricName": "s0-rabbitmq-orders-length",
      "metricLabels": {
        "queue": "orders",
        "scaledObject": "rabbitmq-consumer-scaledobject"
      },
      "timestamp": "2025-02-12T16:20:00Z",
      "value": "267"
    }
  ]
}

Setting up WebSocket connections for:

Live "ACTIVE" state updates of KEDA resources
Live External API metrics updates
Live Replica count updates

Below are some of the watch API endpoints that can be used for this setup:

# Live "ACTIVE" state updates of KEDA resources
$ kubectl get scaledobject -n default -w

NAME                           SCALETARGETNAME     MIN   MAX   READY   ACTIVE   AGE
rabbitmq-consumer-scaledobject rabbitmq-consumer   0     5     True    True     15m
rabbitmq-consumer-scaledobject rabbitmq-consumer   0     5     True    True     16m
rabbitmq-consumer-scaledobject rabbitmq-consumer   0     5     True    False    17m

# Live External API metrics updates (as shown in the previous point)
$ kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1/namespaces/messaging/" | jq .

# Live Replica Count updates 
$ kubectl get hpa -n default -w

NAME                        REFERENCE                     TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
keda-hpa-rabbitmq-consumer Deployment/rabbitmq-consumer  267/50    0         5         5          15m
keda-hpa-rabbitmq-consumer Deployment/rabbitmq-consumer  156/50    0         5         3          16m
keda-hpa-rabbitmq-consumer Deployment/rabbitmq-consumer  45/50     0         5         1          17m

# Detailed HPA status
$ kubectl describe hpa keda-hpa-rabbitmq-consumer

Name:                                                  keda-hpa-rabbitmq-consumer
Namespace:                                             default
Labels:                                               scaledobject.keda.sh/name=rabbitmq-consumer-scaledobject
Annotations:                                          <none>
CreationTimestamp:                                    2025-02-12T16:17:29Z
Reference:                                            Deployment/rabbitmq-consumer
Metrics:                                              ( current / target )
  "s0-rabbitmq-orders-length" (external metric):      267 / 50
Min replicas:                                         0
Max replicas:                                         5
Deployment pods:                                      5 current / 5 desired
Conditions:
  Type            Status  Reason               Message
  ----            ------  ------               -------
  AbleToScale     True    ReadyForNewScale    recommended size matches current size
  ScalingActive   True    ValidMetricFound    the HPA was able to successfully calculate a replica count from external metric
  ScalingLimited  True    TooManyReplicas     the desired replica count is more than the maximum replica count
Events:
  Type    Reason             Age   From                       Message
  ----    ------             ----  ----                       -------
  Normal  SuccessfulRescale  1m    horizontal-pod-autoscaler  New size: 5; reason: external metric s0-rabbitmq-orders-length above target
  Normal  SuccessfulRescale  2m    horizontal-pod-autoscaler  New size: 3; reason: external metric s0-rabbitmq-orders-length above target
  Normal  SuccessfulRescale  3m    horizontal-pod-autoscaler  New size: 1; reason: external metric s0-rabbitmq-orders-length below target

Week 7-8:

Visualizing the real-time external API metrics updates and replica count / scaling event updates that have been setup in the previous week onto the UI using Recharts library.

Below is a sample demo of the replica count updates over time as different triggers take place.

Screencast.from.2025-02-13.09-42-10.mp4

Event correlation and display: Showing why scaling occurred and linking metric thresholds being crossed to replica count updates using the below APIs for understanding the Kubernetes Events getting triggered during the scale out and scale in phases:

# Checking ScaledObject events
$ kubectl get events -n default --field-selector involvedObject.kind=ScaledObject,involvedObject.name=rabbitmq-consumer-scaledobject

LAST SEEN   TYPE     REASON         OBJECT                                        MESSAGE
1m          Normal   ScalerActive   scaledobject/rabbitmq-consumer-scaledobject   Scaler is active
3m          Normal   ScaleExecuted  scaledobject/rabbitmq-consumer-scaledobject   Successfully scaled deployment to 5 replicas - Queue length: 267 messages
8m          Normal   ScaleExecuted  scaledobject/rabbitmq-consumer-scaledobject   Successfully scaled deployment to 2 replicas - Queue length: 85 messages

# Checking HPA events
$ kubectl get events -n default --field-selector involvedObject.kind=HorizontalPodAutoscaler

LAST SEEN   TYPE     REASON              OBJECT                                       MESSAGE
1m          Normal   SuccessfulRescale   horizontalpodautoscaler/keda-hpa-rabbitmq-consumer   New size: 5; reason: queue length above target
6m          Normal   SuccessfulRescale   horizontalpodautoscaler/keda-hpa-rabbitmq-consumer   New size: 2; reason: All metrics below target

# Checking Deployment events
$ kubectl get events -n default --field-selector involvedObject.kind=Deployment,involvedObject.name=rabbitmq-consumer

LAST SEEN   TYPE     REASON              OBJECT                       MESSAGE
1m          Normal   ScalingReplicaSet   deployment/rabbitmq-consumer Scaled up replica set rabbitmq-consumer-6d5f7cf9d8 to 5
6m          Normal   ScalingReplicaSet   deployment/rabbitmq-consumer Scaled down replica set rabbitmq-consumer-6d5f7cf9d8 to 2

Week 9-10:

Relationship Visualization: Showing how ScaledObject connects to:
- Target workload (Deployment/StatefulSet)
- HPA it created
- Metrics it's watching
- Trigger sources
using Headlamp's Map View
Cross-linking Navigation: Easy navigation between related resources:
- Clicking on target workload to see its details
- Clicking on HPA to see its current status
- Clicking on metrics to see its details
- Clicking on trigger sources to see their details

Week 11-12:

Writing multiple e2e-test scenarios and unit-tests to achieve maximum test coverage.
Documenting the KEDA plugin for Headlamp.
Writing examples demonstrating usages for different trigger sources and metrics types, and explaining how to interpret and visualize them via Headlamp's UI, including the Map View visualization.
Implementing creation wizard for new ScaledObjects and ScaledJobs.

adwait-godbole/LFX-2025-Headlamp-KEDA-Plugin-Proposal.md