Skip to content

Instantly share code, notes, and snippets.

@hkboujrida
Created January 30, 2026 13:17
Show Gist options
  • Select an option

  • Save hkboujrida/07b6255ce3c58801ab3f3357cc1deb9c to your computer and use it in GitHub Desktop.

Select an option

Save hkboujrida/07b6255ce3c58801ab3f3357cc1deb9c to your computer and use it in GitHub Desktop.
Hybrid Observability Architecture Documentation
This document describes the unified monitoring and logging strategy for a multi-environment hybrid infrastructure (INT, PREPROD, PRD) encompassing On-Premises Service Fabric, SQL Servers, AS400, and Kubernetes.
1. High-Level Architecture Diagram
The diagram below illustrates the flow of telemetry data from various sources through Grafana Alloy collectors to the centralized LGTM (Loki, Grafana, Tempo, Mimir) stack.
graph TD
subgraph "On-Premises Infrastructure (INT/PREPROD/PRD)"
subgraph "Windows Node (Service Fabric)"
SF_App[SF Services]
SF_Alloy[Alloy Service]
OS_Metrics[CPU/RAM/Disk/Process]
end
subgraph "SQL Server Node"
SQL_DB[(SQL Server)]
SQL_Alloy[Alloy Service]
end
subgraph "Legacy Layer"
AS400[AS400 / iSeries]
end
end
subgraph "Kubernetes Cluster"
K8s_Alloy[Alloy DaemonSet/Deployment]
JDBC_Exp[Custom JDBC Exporter]
BB_Exp[Blackbox Exporter]
subgraph "LGTM Stack"
Mimir[(Mimir - Metrics)]
Loki[(Loki - Logs)]
Grafana[Grafana - Visualization]
end
end
%% Flows
OS_Metrics -->|Scrape| SF_Alloy
SF_App -->|Logs| SF_Alloy
SQL_DB -->|Scrape| SQL_Alloy
SF_Alloy -->|Push Remote Write| Mimir
SF_Alloy -->|Push Loki Write| Loki
SQL_Alloy -->|Push| Mimir
AS400 -->|JDBC Query| JDBC_Exp
JDBC_Exp -->|Metrics| K8s_Alloy
BB_Exp -->|ICMP/HTTP Probes| K8s_Alloy
K8s_Alloy -->|Push| Mimir
K8s_Alloy -->|Push| Loki
Grafana -->|Query| Mimir
Grafana -->|Query| Loki
2. Product & Tool Glossary
For readers unfamiliar with the stack, here is a brief description of each core component:
The Observability Pipeline
Grafana Alloy: A vendor-neutral telemetry collector (successor to Grafana Agent). It is highly "pluggable" and handles the scraping, transforming, and forwarding of metrics, logs, and traces.
Prometheus: The industry-standard format for metrics. In this architecture, we use the Prometheus protocol for scraping data.
Prometheus Exporters: Small "sidecar" programs that translate non-Prometheus data (like Windows OS stats or SQL queries) into a format Alloy can understand.
Blackbox Exporter: Probes endpoints over HTTP, HTTPS, DNS, TCP, and ICMP to see if they are "alive" from the outside.
JDBC Exporter: A custom bridge that uses Java Database Connectivity to run queries against legacy databases (AS400) and turn the results into metrics.
The LGTM Backend Stack
Grafana: The visualization layer used to build dashboards and alerts.
Loki: A horizontally scalable, highly available, multi-tenant log aggregation system (inspired by Prometheus).
Mimir: The backend for metrics. It provides long-term storage and massive scalability for Prometheus data.
Tempo: (Optional/Included) A high-volume, distributed tracing backend.
3. How Alloy Works with LGTM (Internal Pipeline)
Alloy acts as the "glue" between your data sources and the LGTM stack. It uses a component-based configuration (HCL) to define a declarative pipeline.
The Alloy Pipeline Diagram
graph LR
subgraph "Alloy Internal Pipeline"
Discovery[<b>1. Discovery</b><br/>Find Targets<br/>K8s API / Static Config]
Collection[<b>2. Collection</b><br/>Scrape Metrics<br/>Tail Logs]
Processing[<b>3. Processing</b><br/>Relabel / Filter<br/>Add 'env' labels]
Forwarding[<b>4. Forwarding</b><br/>Batching / Retry]
end
subgraph "LGTM Stack (Storage)"
Mimir[Mimir<br/>Metrics]
Loki[Loki<br/>Logs]
end
Discovery --> Collection
Collection --> Processing
Processing --> Forwarding
Forwarding -->|prometheus.remote_write| Mimir
Forwarding -->|loki.write| Loki
Key Stages Explained:
Discovery: Alloy automatically identifies what needs to be monitored. In your K8s cluster, it watches the K8s API to find pods. On Windows, it uses static configurations to find SQL Server ports.
Collection: Alloy "pulls" data from endpoints (like the Windows Exporter or Traefik) or "tails" files for logs.
Processing (Relabeling): This is where Alloy adds the critical env="prd" or workload="sql" labels. It can also drop sensitive data or filter out noise before sending it over the network.
Forwarding: Alloy batches data to optimize network usage and implements an exponential backoff retry strategy. If the K8s cluster (Mimir/Loki) is temporarily unreachable, Alloy buffers the data in memory.
4. Detailed Component Roles
A. Grafana Alloy Deployment Modes
Service Mode (Windows): Installed as a native Windows Service on SF and SQL nodes. It focuses on OS-level health and local log files.
Deployment/DaemonSet Mode (K8s): Runs inside your cluster. It handles high-level probes (Blackbox) and the AS400 bridge.
B. On-Premises Workloads
Service Fabric (SF) Cluster:
Logs: Alloy tails application log files and Windows Event Logs.
OS Metrics: Uses prometheus.exporter.windows to capture system-level health.
SQL Servers:
Role: Collects database-specific health via the MSSQL exporter component within Alloy.
AS400 (Custom Exporter):
The Bridge: A container in K8s runs the JDBC Exporter. It connects to the AS400 via the network to extract system utilization data.
5. Environment Isolation Strategy
To manage the three environments efficiently, the design uses Label-Based Multi-Tenancy:
Environment
Source Label
Backend Destination
INT
env="int"
Mimir/Loki (Tenant: INT)
PREPROD
env="preprod"
Mimir/Loki (Tenant: PREPROD)
PRD
env="prd"
Mimir/Loki (Tenant: PRD)
6. Summary of Benefits
Unified Tooling: Same configuration language for Windows, K8s, and AS400.
Security: Uses a "Push" architecture, minimizing firewall changes on-premises.
Resilience: Alloy's internal forwarding logic ensures no data is lost during minor network blips.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment