Skip to content

Instantly share code, notes, and snippets.

@adwait-godbole
Last active February 13, 2025 10:08
Show Gist options
  • Save adwait-godbole/824bf825423cd8caad6226a727217a8f to your computer and use it in GitHub Desktop.
Save adwait-godbole/824bf825423cd8caad6226a727217a8f to your computer and use it in GitHub Desktop.

LFX 2025 Term 1 - Headlamp Plugin for KEDA

Project Synopsis

The Headlamp KEDA Plugin is an innovative solution designed to simplify the management and monitoring of KEDA resources in Kubernetes. KEDA, the Kubernetes-based Event Driven Autoscaler, excels at scaling workloads based on real-time events and supports multiple scaling triggers. However, administrators and developers often face challenges such as limited visibility into ScaledObjects and ScaledJobs, difficulty monitoring real-time scaling metrics, and a fragmented view of KEDA’s performance alongside other Kubernetes resources.

This project aims to bridge these gaps by integrating a comprehensive plugin into Headlamp. Key features include:

  • Resource Management: An intuitive interface for listing, viewing, editing, and managing ScaledObjects and ScaledJobs. This includes a detailed view with an inline YAML editor and creation wizard.
  • Real-Time Monitoring: Integration of real-time metrics and trigger states using live-updating graphs and WebSocket-driven updates, enabling users to quickly identify scaling events.
  • Event Visualization: A timeline and event log panel for correlating scaling actions with system events, coupled with historical activity graphs to facilitate thorough troubleshooting.
  • Relationship Mapping: Headlamp's Map View depicting the relationships between KEDA resources, target workloads, and event sources for enhanced navigation and system insight.

By unifying these capabilities into a single, user-friendly dashboard, the Headlamp KEDA Plugin will empower Kubernetes teams to efficiently manage event-driven scaling, reduce operational complexity, and improve overall system reliability.

Timeline and Implementation Plans

Week 1-2:

  1. Getting familiar with various scaling triggers and YAML specifications of ScaledObjects and ScaledJobs referring the official KEDA website at keda.sh.

  2. Initializing a KEDA setup by deploying keda-admission, keda-metrics-apiserver and keda-operator controllers for future integration with Headlamp and getting multiple demo KEDA resources up and running covering various scenarios of triggers and configurations.

    An example of a RabbitMQ based KEDA ScaledObject that I will be using for testing purposes:

    apiVersion: keda.sh/v1alpha1
    kind: ScaledObject
    metadata:
      name: rabbitmq-consumer-scaledobject
      namespace: default
    spec:
      scaleTargetRef:
        apiVersion: apps/v1                    # Optional. Default: apps/v1
        kind: Deployment                       # Optional. Default: Deployment
        name: rabbitmq-consumer                # Mandatory. Must be in the same namespace as the ScaledObject
      pollingInterval: 5                       # Check queue length every 5 seconds
      cooldownPeriod: 60                       # Wait 60 seconds before scaling down to avoid rapid scaling
      minReplicaCount: 0                       # Scale to zero when queue is empty
      maxReplicaCount: 5                       # Don't exceed 5 replicas (based on our processing capacity)
      fallback:                                # Fallback ensures minimum service if scaling fails
        failureThreshold: 3                    # Try 3 times before falling back
        replicas: 1                            # Run one replica in fallback mode
      advanced:                                # Advanced scaling behavior configuration
        restoreToOriginalReplicaCount: true    # Return to original count after scaling event
        horizontalPodAutoscalerConfig:         # Fine-tune HPA behavior
          name: keda-hpa-rabbitmq-consumer     # Custom HPA name for better identification
          behavior:                            # Configure scaling behavior
            scaleDown:
              stabilizationWindowSeconds: 600  # Wait 10 minutes before scaling down
              policies:
              - type: Percent
                value: 100
                periodSeconds: 15
      triggers:
      - type: rabbitmq
        metadata:
          # Required connection settings
          host: amqp://guest:[email protected]:5672  # RabbitMQ connection string
          queueName: orders                               # Queue to monitor
          mode: QueueLength                               # Scale based on number of messages
          value: "50"                                     # Target 50 messages per pod
    
          # Optional RabbitMQ-specific settings
          protocol: amqp                                  # Protocol for connection
          vhost: "/"                                      # Virtual host in RabbitMQ

    A similar example of a ScaledJob KEDA resource will be deployed for testing purposes.

  3. Integrating backend and frontend to display list views of these KEDA resources with filtering and sorting capabilities taking inspiration from the below output.

    $ kubectl get scaledobjects
    
    NAME                           SCALETARGETKIND      SCALETARGETNAME     MIN   MAX   READY   ACTIVE   FALLBACK   PAUSED   TRIGGERS   AGE
    rabbitmq-consumer-scaledobject apps/v1.Deployment   rabbitmq-consumer   0     5     True    True     False     Unknown   rabbitmq   15m

Week 3-4:

  1. Implementating detailed views of individual KEDA resources and showing configurations and relationships taking inspiration from the below output:

    $ kubectl get scaledobject rabbitmq-consumer-scaledobject -o yaml
    
    apiVersion: keda.sh/v1alpha1
    kind: ScaledObject
    metadata:
      annotations:
        kubectl.kubernetes.io/last-applied-configuration: |
          {"apiVersion":"keda.sh/v1alpha1","kind":"ScaledObject","metadata":{"annotations":{},"name":"rabbitmq-consumer-scaledobject","namespace":"default"},"spec":{"advanced":{"horizontalPodAutoscalerConfig":{"behavior":{"scaleDown":{"policies":[{"periodSeconds":15,"type":"Percent","value":100}],"stabilizationWindowSeconds":600}},"name":"keda-hpa-rabbitmq-consumer"},"restoreToOriginalReplicaCount":true},"cooldownPeriod":60,"fallback":{"failureThreshold":3,"replicas":1},"maxReplicaCount":5,"minReplicaCount":0,"pollingInterval":5,"scaleTargetRef":{"apiVersion":"apps/v1","kind":"Deployment","name":"rabbitmq-consumer"},"triggers":[{"metadata":{"host":"amqp://guest:[email protected]:5672","mode":"QueueLength","queueName":"orders","value":"50"},"type":"rabbitmq"}]}}
      creationTimestamp: "2025-02-12T16:17:29Z"
      finalizers:
      - finalizer.keda.sh
      generation: 2
      labels:
        scaledobject.keda.sh/name: rabbitmq-consumer-scaledobject
      name: rabbitmq-consumer-scaledobject
      namespace: default
      resourceVersion: "3604"
      uid: f2f1b81f-ce3f-4a48-9a8e-81bb82c47263
    spec:
      # [Previous spec section remains the same as above]
    status:
      conditions:
      - message: ScaledObject is defined correctly and is ready for scaling
        reason: ScaledObjectReady
        status: "True"
        type: Ready
      - message: Queue length (267) is above target value (50), scaling is active
        reason: ScalerActive
        status: "True"
        type: Active
      - status: "False"
        type: Fallback
      - status: "False"
        type: Paused
      externalMetricNames:
      - s0-rabbitmq-orders-length
      hpaName: keda-hpa-rabbitmq-consumer
      originalReplicaCount: 1
      scaleTargetGVKR:
        group: apps
        kind: Deployment
        resource: deployments
        version: v1
      scaleTargetKind: apps/v1.Deployment
      triggersTypes: rabbitmq

    As we can see, currently the status conditions show that the scaler is active. When the scaling trigger/condition is not met, this status conditon changes to a not active state and when KEDA goes in the cooldown period, this condition changes to cooldown state. We can use this knowledge further down the weeks to enahnce the real- time updates functionality on the Headlamp UI.

  2. Integrating YAML editor for direct modification of KEDA resources in the detailed view taking inspiration from the previous point's YAML output.

  3. Implementing basic CRUD operations for KEDA resources and completing the integration with Headlamp's navigation and layout.

Week 5-6:

  1. Setup collecting real-time metrics for visualization of active scaling triggers using the below API:

    $ kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1/namespaces/default/" | jq .
    
    {
      "kind": "ExternalMetricValueList",
      "apiVersion": "external.metrics.k8s.io/v1beta1",
      "metadata": {},
      "items": [
        {
          "metricName": "s0-rabbitmq-orders-length",
          "metricLabels": {
            "queue": "orders",
            "scaledObject": "rabbitmq-consumer-scaledobject"
          },
          "timestamp": "2025-02-12T16:20:00Z",
          "value": "267"
        }
      ]
    }
    
  2. Setting up WebSocket connections for:

    • Live "ACTIVE" state updates of KEDA resources
    • Live External API metrics updates
    • Live Replica count updates

    Below are some of the watch API endpoints that can be used for this setup:

    # Live "ACTIVE" state updates of KEDA resources
    $ kubectl get scaledobject -n default -w
    
    NAME                           SCALETARGETNAME     MIN   MAX   READY   ACTIVE   AGE
    rabbitmq-consumer-scaledobject rabbitmq-consumer   0     5     True    True     15m
    rabbitmq-consumer-scaledobject rabbitmq-consumer   0     5     True    True     16m
    rabbitmq-consumer-scaledobject rabbitmq-consumer   0     5     True    False    17m
    
    # Live External API metrics updates (as shown in the previous point)
    $ kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1/namespaces/messaging/" | jq .
    
    # Live Replica Count updates 
    $ kubectl get hpa -n default -w
    
    NAME                        REFERENCE                     TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
    keda-hpa-rabbitmq-consumer Deployment/rabbitmq-consumer  267/50    0         5         5          15m
    keda-hpa-rabbitmq-consumer Deployment/rabbitmq-consumer  156/50    0         5         3          16m
    keda-hpa-rabbitmq-consumer Deployment/rabbitmq-consumer  45/50     0         5         1          17m
    
    # Detailed HPA status
    $ kubectl describe hpa keda-hpa-rabbitmq-consumer
    
    Name:                                                  keda-hpa-rabbitmq-consumer
    Namespace:                                             default
    Labels:                                               scaledobject.keda.sh/name=rabbitmq-consumer-scaledobject
    Annotations:                                          <none>
    CreationTimestamp:                                    2025-02-12T16:17:29Z
    Reference:                                            Deployment/rabbitmq-consumer
    Metrics:                                              ( current / target )
      "s0-rabbitmq-orders-length" (external metric):      267 / 50
    Min replicas:                                         0
    Max replicas:                                         5
    Deployment pods:                                      5 current / 5 desired
    Conditions:
      Type            Status  Reason               Message
      ----            ------  ------               -------
      AbleToScale     True    ReadyForNewScale    recommended size matches current size
      ScalingActive   True    ValidMetricFound    the HPA was able to successfully calculate a replica count from external metric
      ScalingLimited  True    TooManyReplicas     the desired replica count is more than the maximum replica count
    Events:
      Type    Reason             Age   From                       Message
      ----    ------             ----  ----                       -------
      Normal  SuccessfulRescale  1m    horizontal-pod-autoscaler  New size: 5; reason: external metric s0-rabbitmq-orders-length above target
      Normal  SuccessfulRescale  2m    horizontal-pod-autoscaler  New size: 3; reason: external metric s0-rabbitmq-orders-length above target
      Normal  SuccessfulRescale  3m    horizontal-pod-autoscaler  New size: 1; reason: external metric s0-rabbitmq-orders-length below target
    

Week 7-8:

  1. Visualizing the real-time external API metrics updates and replica count / scaling event updates that have been setup in the previous week onto the UI using Recharts library.

    Below is a sample demo of the replica count updates over time as different triggers take place.

    Screencast.from.2025-02-13.09-42-10.mp4
  2. Event correlation and display: Showing why scaling occurred and linking metric thresholds being crossed to replica count updates using the below APIs for understanding the Kubernetes Events getting triggered during the scale out and scale in phases:

    # Checking ScaledObject events
    $ kubectl get events -n default --field-selector involvedObject.kind=ScaledObject,involvedObject.name=rabbitmq-consumer-scaledobject
    
    LAST SEEN   TYPE     REASON         OBJECT                                        MESSAGE
    1m          Normal   ScalerActive   scaledobject/rabbitmq-consumer-scaledobject   Scaler is active
    3m          Normal   ScaleExecuted  scaledobject/rabbitmq-consumer-scaledobject   Successfully scaled deployment to 5 replicas - Queue length: 267 messages
    8m          Normal   ScaleExecuted  scaledobject/rabbitmq-consumer-scaledobject   Successfully scaled deployment to 2 replicas - Queue length: 85 messages
    
    # Checking HPA events
    $ kubectl get events -n default --field-selector involvedObject.kind=HorizontalPodAutoscaler
    
    LAST SEEN   TYPE     REASON              OBJECT                                       MESSAGE
    1m          Normal   SuccessfulRescale   horizontalpodautoscaler/keda-hpa-rabbitmq-consumer   New size: 5; reason: queue length above target
    6m          Normal   SuccessfulRescale   horizontalpodautoscaler/keda-hpa-rabbitmq-consumer   New size: 2; reason: All metrics below target
    
    # Checking Deployment events
    $ kubectl get events -n default --field-selector involvedObject.kind=Deployment,involvedObject.name=rabbitmq-consumer
    
    LAST SEEN   TYPE     REASON              OBJECT                       MESSAGE
    1m          Normal   ScalingReplicaSet   deployment/rabbitmq-consumer Scaled up replica set rabbitmq-consumer-6d5f7cf9d8 to 5
    6m          Normal   ScalingReplicaSet   deployment/rabbitmq-consumer Scaled down replica set rabbitmq-consumer-6d5f7cf9d8 to 2   
    

Week 9-10:

  1. Relationship Visualization: Showing how ScaledObject connects to:

    • Target workload (Deployment/StatefulSet)
    • HPA it created
    • Metrics it's watching
    • Trigger sources

    using Headlamp's Map View

  2. Cross-linking Navigation: Easy navigation between related resources:

    • Clicking on target workload to see its details
    • Clicking on HPA to see its current status
    • Clicking on metrics to see its details
    • Clicking on trigger sources to see their details

Week 11-12:

  1. Writing multiple e2e-test scenarios and unit-tests to achieve maximum test coverage.

  2. Documenting the KEDA plugin for Headlamp.

  3. Writing examples demonstrating usages for different trigger sources and metrics types, and explaining how to interpret and visualize them via Headlamp's UI, including the Map View visualization.

  4. Implementing creation wizard for new ScaledObjects and ScaledJobs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment