Skip to content

Instantly share code, notes, and snippets.

@midwire
Created March 27, 2026 00:09
Show Gist options
  • Select an option

  • Save midwire/3248cf4d69a8da1dca55d3b146ec74e7 to your computer and use it in GitHub Desktop.

Select an option

Save midwire/3248cf4d69a8da1dca55d3b146ec74e7 to your computer and use it in GitHub Desktop.
GOAT Leads — PG18 Cluster Architecture & Backup Strategy

GOAT Leads — PostgreSQL 18 Cluster Architecture

---
title: GOAT Leads — PostgreSQL 18 Cluster Architecture
---
flowchart TB
    subgraph APP["🌐 Application Layer"]
        RAILS["Rails App Servers<br/>(Capistrano deploys)"]
    end

    subgraph LB["⚖️ Load Balancer"]
        HAPROXY["HAProxy<br/>173.255.199.9"]
    end

    RAILS -->|"All DB connections"| HAPROXY
    HAPROXY -->|"Writes → port 5432"| DB2
    HAPROXY -->|"Reads → port 5433"| DB1
    HAPROXY -->|"Reads → port 5433"| DB3

    subgraph CLUSTER["🗄️ Patroni HA Cluster (auto-failover < 30 sec)"]
        DB2["<b>goat-db2</b><br/>✅ LEADER<br/>45.56.66.97<br/>Accepts writes"]
        DB1["<b>goat-db1</b><br/>📖 Replica<br/>45.56.66.90<br/>Read-only"]
        DB3["<b>goat-db3</b><br/>📖 Replica<br/>45.33.1.136<br/>Read-only + Backup duties"]

        DB2 -- "Synchronous<br/>streaming replication" --> DB1
        DB2 -- "Synchronous<br/>streaming replication" --> DB3
    end

    subgraph BACKUPS["🛡️ 3-2-1 Backup Strategy (~$30/month)"]
        direction TB

        subgraph LAYER1["Layer 1 — Linode Disk Snapshots"]
            SNAP["Full-disk backups<br/>of Leader node + HAProxy<br/><i>Recovers destroyed servers</i>"]
        end

        subgraph LAYER2["Layer 2 — pgBackRest (Point-in-Time Recovery)"]
            direction LR
            FULL["Full backup<br/>Sundays 1 AM UTC"]
            DIFF["Differential backup<br/>Mon–Sat 1 AM UTC"]
            WAL["Continuous WAL<br/>archiving 24/7"]
        end

        subgraph LAYER3["Layer 3 — Off-Site S3 Storage"]
            S3["☁️ Linode Object Storage<br/>Bucket: goatleads-pgbackrest<br/>AES-256 encrypted<br/>Zstandard compressed<br/>30-day retention"]
        end

        subgraph LAYER4["Layer 4 — Weekly Logical Dump"]
            DUMP["pg_dump (custom format)<br/>Sundays 3 AM UTC<br/>~20 GB compressed<br/>Synced to S3, 30-day retention<br/><i>Version-independent restore</i>"]
        end
    end

    DB3 -.->|"Runs backups"| FULL
    DB3 -.->|"Runs backups"| DIFF
    DB2 -->|"WAL archive-push"| WAL
    FULL & DIFF & WAL -->|"Encrypted upload"| S3
    DB3 -.->|"Weekly dump"| DUMP
    DUMP -->|"Synced"| S3
    DB2 -.->|"Snapshot"| SNAP

    subgraph MONITOR["📊 Monitoring & Alerts"]
        MON["Slack alerts every 6 hrs<br/>• Backup freshness check<br/>• Disk usage (alert at 80%)<br/>• WAL archiving health<br/>• Replica lag"]
    end

    DB3 -.-> MON

    style DB2 fill:#28a745,stroke:#155724,color:#fff
    style DB1 fill:#007bff,stroke:#0056b3,color:#fff
    style DB3 fill:#007bff,stroke:#0056b3,color:#fff
    style HAPROXY fill:#fd7e14,stroke:#c76a11,color:#fff
    style S3 fill:#6f42c1,stroke:#4a2d87,color:#fff
    style CLUSTER fill:#e8f5e9,stroke:#28a745
    style BACKUPS fill:#fff3e0,stroke:#fd7e14
    style MONITOR fill:#e3f2fd,stroke:#007bff
Loading

How It Works

Concern How We Handle It
High Availability 3 database servers managed by Patroni. If the leader goes down, a replica automatically promotes to leader in under 30 seconds — no human intervention needed.
Read Performance HAProxy routes read traffic to the 2 replicas, keeping the leader free for writes.
Point-in-Time Recovery pgBackRest continuously archives every database change to encrypted off-site cloud storage. We can restore to any second in the last 30 days.
Full Backups Weekly full + daily differential backups, plus Linode disk snapshots of the leader server.
Weekly Safety Net A traditional pg_dump runs weekly — this is version-independent, meaning we could restore to a completely different PostgreSQL version if needed.
Monitoring Automated Slack alerts every 6 hours check backup freshness, disk usage, WAL archiving health, and replica lag.
Cost ~$30/month total for all backup storage and snapshots.

Disaster Recovery Scenarios

Scenario Recovery Path
A server crashes Automatic failover in <30 sec (zero data loss)
Disk corruption Restore from pgBackRest + WAL replay
Someone accidentally deletes data Point-in-time recovery to the second before it happened
Ransomware / compromise Off-site encrypted S3 backups (attacker can't reach them)
Entire data center goes down Restore from off-site S3 to new servers
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment