GOAT Leads — PostgreSQL 18 Cluster Architecture

---
title: GOAT Leads — PostgreSQL 18 Cluster Architecture
---
flowchart TB
    subgraph APP["🌐 Application Layer"]
        RAILS["Rails App Servers<br/>(Capistrano deploys)"]
    end

    subgraph LB["⚖️ Load Balancer"]
        HAPROXY["HAProxy<br/>173.255.199.9"]
    end

    RAILS -->|"All DB connections"| HAPROXY
    HAPROXY -->|"Writes → port 5432"| DB2
    HAPROXY -->|"Reads → port 5433"| DB1
    HAPROXY -->|"Reads → port 5433"| DB3

    subgraph CLUSTER["🗄️ Patroni HA Cluster (auto-failover < 30 sec)"]
        DB2["<b>goat-db2</b><br/>✅ LEADER<br/>45.56.66.97<br/>Accepts writes"]
        DB1["<b>goat-db1</b><br/>📖 Replica<br/>45.56.66.90<br/>Read-only"]
        DB3["<b>goat-db3</b><br/>📖 Replica<br/>45.33.1.136<br/>Read-only + Backup duties"]

        DB2 -- "Synchronous<br/>streaming replication" --> DB1
        DB2 -- "Synchronous<br/>streaming replication" --> DB3
    end

    subgraph BACKUPS["🛡️ 3-2-1 Backup Strategy (~$30/month)"]
        direction TB

        subgraph LAYER1["Layer 1 — Linode Disk Snapshots"]
            SNAP["Full-disk backups<br/>of Leader node + HAProxy<br/><i>Recovers destroyed servers</i>"]
        end

        subgraph LAYER2["Layer 2 — pgBackRest (Point-in-Time Recovery)"]
            direction LR
            FULL["Full backup<br/>Sundays 1 AM UTC"]
            DIFF["Differential backup<br/>Mon–Sat 1 AM UTC"]
            WAL["Continuous WAL<br/>archiving 24/7"]
        end

        subgraph LAYER3["Layer 3 — Off-Site S3 Storage"]
            S3["☁️ Linode Object Storage<br/>Bucket: goatleads-pgbackrest<br/>AES-256 encrypted<br/>Zstandard compressed<br/>30-day retention"]
        end

        subgraph LAYER4["Layer 4 — Weekly Logical Dump"]
            DUMP["pg_dump (custom format)<br/>Sundays 3 AM UTC<br/>~20 GB compressed<br/>Synced to S3, 30-day retention<br/><i>Version-independent restore</i>"]
        end
    end

    DB3 -.->|"Runs backups"| FULL
    DB3 -.->|"Runs backups"| DIFF
    DB2 -->|"WAL archive-push"| WAL
    FULL & DIFF & WAL -->|"Encrypted upload"| S3
    DB3 -.->|"Weekly dump"| DUMP
    DUMP -->|"Synced"| S3
    DB2 -.->|"Snapshot"| SNAP

    subgraph MONITOR["📊 Monitoring & Alerts"]
        MON["Slack alerts every 6 hrs<br/>• Backup freshness check<br/>• Disk usage (alert at 80%)<br/>• WAL archiving health<br/>• Replica lag"]
    end

    DB3 -.-> MON

    style DB2 fill:#28a745,stroke:#155724,color:#fff
    style DB1 fill:#007bff,stroke:#0056b3,color:#fff
    style DB3 fill:#007bff,stroke:#0056b3,color:#fff
    style HAPROXY fill:#fd7e14,stroke:#c76a11,color:#fff
    style S3 fill:#6f42c1,stroke:#4a2d87,color:#fff
    style CLUSTER fill:#e8f5e9,stroke:#28a745
    style BACKUPS fill:#fff3e0,stroke:#fd7e14
    style MONITOR fill:#e3f2fd,stroke:#007bff

How It Works

Concern	How We Handle It
High Availability	3 database servers managed by Patroni. If the leader goes down, a replica automatically promotes to leader in under 30 seconds — no human intervention needed.
Read Performance	HAProxy routes read traffic to the 2 replicas, keeping the leader free for writes.
Point-in-Time Recovery	pgBackRest continuously archives every database change to encrypted off-site cloud storage. We can restore to any second in the last 30 days.
Full Backups	Weekly full + daily differential backups, plus Linode disk snapshots of the leader server.
Weekly Safety Net	A traditional `pg_dump` runs weekly — this is version-independent, meaning we could restore to a completely different PostgreSQL version if needed.
Monitoring	Automated Slack alerts every 6 hours check backup freshness, disk usage, WAL archiving health, and replica lag.
Cost	~$30/month total for all backup storage and snapshots.

Disaster Recovery Scenarios

Scenario	Recovery Path
A server crashes	Automatic failover in <30 sec (zero data loss)
Disk corruption	Restore from pgBackRest + WAL replay
Someone accidentally deletes data	Point-in-time recovery to the second before it happened
Ransomware / compromise	Off-site encrypted S3 backups (attacker can't reach them)
Entire data center goes down	Restore from off-site S3 to new servers

midwire/pg18-cluster-architecture.md

Select an option

No results found

Select an option

No results found

GOAT Leads — PostgreSQL 18 Cluster Architecture

How It Works

Disaster Recovery Scenarios