hkboujrida · January 30, 2026 13:17
diff --git a/gistfile1.txt b/gistfile1.txt
 Hybrid Observability Architecture Documentation

 This document describes the unified monitoring and logging strategy for a multi-environment hybrid infrastructure (INT, PREPROD, PRD) encompassing On-Premises Service Fabric, SQL Servers, AS400, and Kubernetes.

 1. High-Level Architecture Diagram

 The diagram below illustrates the flow of telemetry data from various sources through Grafana Alloy collectors to the centralized LGTM (Loki, Grafana, Tempo, Mimir) stack.

 graph TD
    subgraph "On-Premises Infrastructure (INT/PREPROD/PRD)"
        subgraph "Windows Node (Service Fabric)"
            SF_App[SF Services]
            SF_Alloy[Alloy Service]
            OS_Metrics[CPU/RAM/Disk/Process]
        end
        
        subgraph "SQL Server Node"
            SQL_DB[(SQL Server)]
            SQL_Alloy[Alloy Service]
        end
        
        subgraph "Legacy Layer"
            AS400[AS400 / iSeries]
        end
    end

    subgraph "Kubernetes Cluster"
        K8s_Alloy[Alloy DaemonSet/Deployment]
        JDBC_Exp[Custom JDBC Exporter]
        BB_Exp[Blackbox Exporter]
        
        subgraph "LGTM Stack"
            Mimir[(Mimir - Metrics)]
            Loki[(Loki - Logs)]
            Grafana[Grafana - Visualization]
        end
    end

    %% Flows
    OS_Metrics -->|Scrape| SF_Alloy
    SF_App -->|Logs| SF_Alloy
    SQL_DB -->|Scrape| SQL_Alloy
    
    SF_Alloy -->|Push Remote Write| Mimir
    SF_Alloy -->|Push Loki Write| Loki
    SQL_Alloy -->|Push| Mimir
    
    AS400 -->|JDBC Query| JDBC_Exp
    JDBC_Exp -->|Metrics| K8s_Alloy
    BB_Exp -->|ICMP/HTTP Probes| K8s_Alloy
    K8s_Alloy -->|Push| Mimir
    K8s_Alloy -->|Push| Loki
    
    Grafana -->|Query| Mimir
    Grafana -->|Query| Loki


 2. Product & Tool Glossary

 For readers unfamiliar with the stack, here is a brief description of each core component:

 The Observability Pipeline

 Grafana Alloy: A vendor-neutral telemetry collector (successor to Grafana Agent). It is highly "pluggable" and handles the scraping, transforming, and forwarding of metrics, logs, and traces.

 Prometheus: The industry-standard format for metrics. In this architecture, we use the Prometheus protocol for scraping data.

 Prometheus Exporters: Small "sidecar" programs that translate non-Prometheus data (like Windows OS stats or SQL queries) into a format Alloy can understand.

 Blackbox Exporter: Probes endpoints over HTTP, HTTPS, DNS, TCP, and ICMP to see if they are "alive" from the outside.

 JDBC Exporter: A custom bridge that uses Java Database Connectivity to run queries against legacy databases (AS400) and turn the results into metrics.

 The LGTM Backend Stack

 Grafana: The visualization layer used to build dashboards and alerts.

 Loki: A horizontally scalable, highly available, multi-tenant log aggregation system (inspired by Prometheus).

 Mimir: The backend for metrics. It provides long-term storage and massive scalability for Prometheus data.

 Tempo: (Optional/Included) A high-volume, distributed tracing backend.

 3. How Alloy Works with LGTM (Internal Pipeline)

 Alloy acts as the "glue" between your data sources and the LGTM stack. It uses a component-based configuration (HCL) to define a declarative pipeline.

 The Alloy Pipeline Diagram

 graph LR
    subgraph "Alloy Internal Pipeline"
        Discovery[<b>1. Discovery</b><br/>Find Targets<br/>K8s API / Static Config]
        Collection[<b>2. Collection</b><br/>Scrape Metrics<br/>Tail Logs]
        Processing[<b>3. Processing</b><br/>Relabel / Filter<br/>Add 'env' labels]
        Forwarding[<b>4. Forwarding</b><br/>Batching / Retry]
    end

    subgraph "LGTM Stack (Storage)"
        Mimir[Mimir<br/>Metrics]
        Loki[Loki<br/>Logs]
    end

    Discovery --> Collection
    Collection --> Processing
    Processing --> Forwarding
    Forwarding -->|prometheus.remote_write| Mimir
    Forwarding -->|loki.write| Loki


 Key Stages Explained:

 Discovery: Alloy automatically identifies what needs to be monitored. In your K8s cluster, it watches the K8s API to find pods. On Windows, it uses static configurations to find SQL Server ports.

 Collection: Alloy "pulls" data from endpoints (like the Windows Exporter or Traefik) or "tails" files for logs.

 Processing (Relabeling): This is where Alloy adds the critical env="prd" or workload="sql" labels. It can also drop sensitive data or filter out noise before sending it over the network.

 Forwarding: Alloy batches data to optimize network usage and implements an exponential backoff retry strategy. If the K8s cluster (Mimir/Loki) is temporarily unreachable, Alloy buffers the data in memory.

 4. Detailed Component Roles

 A. Grafana Alloy Deployment Modes

 Service Mode (Windows): Installed as a native Windows Service on SF and SQL nodes. It focuses on OS-level health and local log files.

 Deployment/DaemonSet Mode (K8s): Runs inside your cluster. It handles high-level probes (Blackbox) and the AS400 bridge.

 B. On-Premises Workloads

 Service Fabric (SF) Cluster:

 Logs: Alloy tails application log files and Windows Event Logs.

 OS Metrics: Uses prometheus.exporter.windows to capture system-level health.

 SQL Servers:

 Role: Collects database-specific health via the MSSQL exporter component within Alloy.

 AS400 (Custom Exporter):

 The Bridge: A container in K8s runs the JDBC Exporter. It connects to the AS400 via the network to extract system utilization data.

 5. Environment Isolation Strategy

 To manage the three environments efficiently, the design uses Label-Based Multi-Tenancy:

 Environment

 Source Label

 Backend Destination

 INT

 env="int"

 Mimir/Loki (Tenant: INT)

 PREPROD

 env="preprod"

 Mimir/Loki (Tenant: PREPROD)

 PRD

 env="prd"

 Mimir/Loki (Tenant: PRD)

 6. Summary of Benefits

 Unified Tooling: Same configuration language for Windows, K8s, and AS400.

 Security: Uses a "Push" architecture, minimizing firewall changes on-premises.

 Resilience: Alloy's internal forwarding logic ensures no data is lost during minor network blips.
	Hybrid Observability Architecture Documentation

	This document describes the unified monitoring and logging strategy for a multi-environment hybrid infrastructure (INT, PREPROD, PRD) encompassing On-Premises Service Fabric, SQL Servers, AS400, and Kubernetes.

	1. High-Level Architecture Diagram

	The diagram below illustrates the flow of telemetry data from various sources through Grafana Alloy collectors to the centralized LGTM (Loki, Grafana, Tempo, Mimir) stack.

	graph TD
	subgraph "On-Premises Infrastructure (INT/PREPROD/PRD)"
	subgraph "Windows Node (Service Fabric)"
	SF_App[SF Services]
	SF_Alloy[Alloy Service]
	OS_Metrics[CPU/RAM/Disk/Process]
	end

	subgraph "SQL Server Node"
	SQL_DB[(SQL Server)]
	SQL_Alloy[Alloy Service]
	end

	subgraph "Legacy Layer"
	AS400[AS400 / iSeries]
	end
	end

	subgraph "Kubernetes Cluster"
	K8s_Alloy[Alloy DaemonSet/Deployment]
	JDBC_Exp[Custom JDBC Exporter]
	BB_Exp[Blackbox Exporter]

	subgraph "LGTM Stack"
	Mimir[(Mimir - Metrics)]
	Loki[(Loki - Logs)]
	Grafana[Grafana - Visualization]
	end
	end

	%% Flows
	OS_Metrics -->\|Scrape\| SF_Alloy
	SF_App -->\|Logs\| SF_Alloy
	SQL_DB -->\|Scrape\| SQL_Alloy

	SF_Alloy -->\|Push Remote Write\| Mimir
	SF_Alloy -->\|Push Loki Write\| Loki
	SQL_Alloy -->\|Push\| Mimir

	AS400 -->\|JDBC Query\| JDBC_Exp
	JDBC_Exp -->\|Metrics\| K8s_Alloy
	BB_Exp -->\|ICMP/HTTP Probes\| K8s_Alloy
	K8s_Alloy -->\|Push\| Mimir
	K8s_Alloy -->\|Push\| Loki

	Grafana -->\|Query\| Mimir
	Grafana -->\|Query\| Loki


	2. Product & Tool Glossary

	For readers unfamiliar with the stack, here is a brief description of each core component:

	The Observability Pipeline

	Grafana Alloy: A vendor-neutral telemetry collector (successor to Grafana Agent). It is highly "pluggable" and handles the scraping, transforming, and forwarding of metrics, logs, and traces.

	Prometheus: The industry-standard format for metrics. In this architecture, we use the Prometheus protocol for scraping data.

	Prometheus Exporters: Small "sidecar" programs that translate non-Prometheus data (like Windows OS stats or SQL queries) into a format Alloy can understand.

	Blackbox Exporter: Probes endpoints over HTTP, HTTPS, DNS, TCP, and ICMP to see if they are "alive" from the outside.

	JDBC Exporter: A custom bridge that uses Java Database Connectivity to run queries against legacy databases (AS400) and turn the results into metrics.

	The LGTM Backend Stack

	Grafana: The visualization layer used to build dashboards and alerts.

	Loki: A horizontally scalable, highly available, multi-tenant log aggregation system (inspired by Prometheus).

	Mimir: The backend for metrics. It provides long-term storage and massive scalability for Prometheus data.

	Tempo: (Optional/Included) A high-volume, distributed tracing backend.

	3. How Alloy Works with LGTM (Internal Pipeline)

	Alloy acts as the "glue" between your data sources and the LGTM stack. It uses a component-based configuration (HCL) to define a declarative pipeline.

	The Alloy Pipeline Diagram

	graph LR
	subgraph "Alloy Internal Pipeline"
	Discovery[<b>1. Discovery</b><br/>Find Targets<br/>K8s API / Static Config]
	Collection[<b>2. Collection</b><br/>Scrape Metrics<br/>Tail Logs]
	Processing[<b>3. Processing</b><br/>Relabel / Filter<br/>Add 'env' labels]
	Forwarding[<b>4. Forwarding</b><br/>Batching / Retry]
	end

	subgraph "LGTM Stack (Storage)"
	Mimir[Mimir<br/>Metrics]
	Loki[Loki<br/>Logs]
	end

	Discovery --> Collection
	Collection --> Processing
	Processing --> Forwarding
	Forwarding -->\|prometheus.remote_write\| Mimir
	Forwarding -->\|loki.write\| Loki


	Key Stages Explained:

	Discovery: Alloy automatically identifies what needs to be monitored. In your K8s cluster, it watches the K8s API to find pods. On Windows, it uses static configurations to find SQL Server ports.

	Collection: Alloy "pulls" data from endpoints (like the Windows Exporter or Traefik) or "tails" files for logs.

	Processing (Relabeling): This is where Alloy adds the critical env="prd" or workload="sql" labels. It can also drop sensitive data or filter out noise before sending it over the network.

	Forwarding: Alloy batches data to optimize network usage and implements an exponential backoff retry strategy. If the K8s cluster (Mimir/Loki) is temporarily unreachable, Alloy buffers the data in memory.

	4. Detailed Component Roles

	A. Grafana Alloy Deployment Modes

	Service Mode (Windows): Installed as a native Windows Service on SF and SQL nodes. It focuses on OS-level health and local log files.

	Deployment/DaemonSet Mode (K8s): Runs inside your cluster. It handles high-level probes (Blackbox) and the AS400 bridge.

	B. On-Premises Workloads

	Service Fabric (SF) Cluster:

	Logs: Alloy tails application log files and Windows Event Logs.

	OS Metrics: Uses prometheus.exporter.windows to capture system-level health.

	SQL Servers:

	Role: Collects database-specific health via the MSSQL exporter component within Alloy.

	AS400 (Custom Exporter):

	The Bridge: A container in K8s runs the JDBC Exporter. It connects to the AS400 via the network to extract system utilization data.

	5. Environment Isolation Strategy

	To manage the three environments efficiently, the design uses Label-Based Multi-Tenancy:

	Environment

	Source Label

	Backend Destination

	INT

	env="int"

	Mimir/Loki (Tenant: INT)

	PREPROD

	env="preprod"

	Mimir/Loki (Tenant: PREPROD)

	PRD

	env="prd"

	Mimir/Loki (Tenant: PRD)

	6. Summary of Benefits

	Unified Tooling: Same configuration language for Windows, K8s, and AS400.

	Security: Uses a "Push" architecture, minimizing firewall changes on-premises.

	Resilience: Alloy's internal forwarding logic ensures no data is lost during minor network blips.
No results found