Created
May 22, 2026 10:58
-
-
Save arajkumar/1c2d105d80215b7328af344e71971a9d to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| <!doctype html> | |
| <html lang="en"> | |
| <head> | |
| <meta charset="utf-8" /> | |
| <title>Tigerlake vs live-sync — starting with 4 tables</title> | |
| <style> | |
| :root { | |
| --bg: #0f1320; | |
| --panel: #161c2e; | |
| --panel-2: #1d2540; | |
| --accent: #6cb9ff; | |
| --accent-2: #ff9f6c; | |
| --accent-3: #7be3a4; | |
| --accent-4: #f17ad6; | |
| --warn: #ffd166; | |
| --text: #e7ecf5; | |
| --muted: #9aa4bd; | |
| --line: #2b3656; | |
| } | |
| html, body { background: var(--bg); color: var(--text); font: 14px/1.5 -apple-system, "SF Pro Text", "Inter", sans-serif; margin: 0; } | |
| main { max-width: 1240px; margin: 28px auto; padding: 0 20px 64px; } | |
| h1 { font-size: 22px; margin: 0 0 4px; } | |
| h2 { font-size: 18px; margin: 28px 0 6px; color: var(--accent); } | |
| h3 { font-size: 15px; margin: 16px 0 6px; color: var(--accent-2); } | |
| h4 { font-size: 13px; margin: 12px 0 4px; color: var(--accent-3); letter-spacing: .2px; text-transform: uppercase; } | |
| p { margin: 6px 0; } | |
| code, .mono { font: 12.5px/1.4 ui-monospace, "JetBrains Mono", Menlo, Consolas, monospace; color: #d6e0ff; background: #0c1020; padding: 1px 5px; border-radius: 4px; border: 1px solid #1d2540; } | |
| .lede { color: var(--muted); margin: 0 0 8px; } | |
| .meta { color: var(--muted); font-size: 12.5px; } | |
| ol, ul { margin: 6px 0 0 18px; padding: 0; } | |
| li { margin: 4px 0; } | |
| svg { width: 100%; height: auto; background: linear-gradient(180deg, #0c1020, #10162a); border-radius: 12px; border: 1px solid var(--line); } | |
| .card { background: var(--panel); border: 1px solid var(--line); border-radius: 10px; padding: 14px 16px; margin: 10px 0; } | |
| .twocol { display: grid; grid-template-columns: 1fr 1fr; gap: 16px; } | |
| .tag { display: inline-block; padding: 1px 7px; border-radius: 999px; font-size: 11.5px; border: 1px solid var(--line); background: #0c1020; color: #b3c2e0; } | |
| .tag.ok { color: var(--accent-3); border-color: #1c5a3a; background: #08231a; } | |
| .tag.warn { color: var(--warn); border-color: #5a4a1c; background: #1d1a08; } | |
| .tag.bad { color: #ff7e7e; border-color: #5a1c1c; background: #230808; } | |
| table { border-collapse: collapse; width: 100%; margin-top: 8px; font-size: 13px; } | |
| th, td { border: 1px solid var(--line); padding: 7px 10px; text-align: left; vertical-align: top; } | |
| th { background: #0c1020; color: var(--muted); font-weight: 500; white-space: nowrap; } | |
| .legend { display: flex; flex-wrap: wrap; gap: 14px; margin: 6px 0 8px; } | |
| .legend span { display: inline-flex; align-items: center; gap: 6px; color: var(--muted); font-size: 12.5px; } | |
| .swatch { width: 12px; height: 10px; border-radius: 2px; display: inline-block; } | |
| .verdict { background: #131a30; border: 1px solid #2b3656; border-left: 3px solid var(--accent-3); padding: 12px 14px; border-radius: 8px; } | |
| </style> | |
| </head> | |
| <body> | |
| <main> | |
| <h1>Starting with 4 tables: Tigerlake vs live-sync</h1> | |
| <p class="lede">Same starting state (T1..T4, none synced yet); fundamentally different shape on the wall clock because of single-snapshotter vs worker-pool design.</p> | |
| <h2>Setup assumption</h2> | |
| <div class="twocol"> | |
| <div class="card"> | |
| <h3>Tigerlake</h3> | |
| <ul> | |
| <li>4 rows in <code>_timescaledb_lake_catalog.sync_tables</code>, none have a matching row in <code>_timescaledb_lake_tables.snapshot</code> → all 4 are <code>SnapshotStatus.Needed</code>.</li> | |
| <li>1 logical replication slot <code>timescaledb_lake_slot</code> shared by everything.</li> | |
| <li>1 snapshotter thread, 1 CDC thread.</li> | |
| </ul> | |
| </div> | |
| <div class="card"> | |
| <h3>live-sync</h3> | |
| <ul> | |
| <li>4 rows in <code>_ts_live_sync.subscription_rel</code> with <code>state ∈ {INIT, DATASYNC}</code> (anything but <code>READY</code>/<code>NONE</code>).</li> | |
| <li>1 leader slot for the subscription + 1 per-table replication slot per active tablesync worker.</li> | |
| <li>Worker pool of size <code>tableSyncWorkers</code> (let's say <b>4</b> to match table count; can be smaller).</li> | |
| </ul> | |
| </div> | |
| </div> | |
| <h2>Wall-clock timeline (all 4 fresh, never snapshotted)</h2> | |
| <div class="legend"> | |
| <span><span class="swatch" style="background:#6cb9ff"></span>Bulk copy / SELECT</span> | |
| <span><span class="swatch" style="background:#7be3a4"></span>Catchup / branch merge</span> | |
| <span><span class="swatch" style="background:#ff9f6c"></span>Steady-state CDC on main</span> | |
| <span><span class="swatch" style="background:#9aa4bd"></span>Waiting (Needed; CDC discarded)</span> | |
| </div> | |
| <svg viewBox="0 0 1200 500" role="img" aria-label="Wall-clock timeline for 4 fresh tables"> | |
| <defs> | |
| <style> | |
| .lane { stroke: #2b3656; stroke-dasharray: 4 4; } | |
| .title { font: 600 13px -apple-system, sans-serif; fill: #e7ecf5; } | |
| .sub { font: 11.5px -apple-system, sans-serif; fill: #cfd9f5; } | |
| .axis { stroke: #2b3656; stroke-width: 1; } | |
| .axisLabel { font: 11px ui-monospace, Menlo, monospace; fill: #9aa4bd; } | |
| .barWait { fill: #2b3656; stroke: #3a4566; } | |
| .barCopy { fill: #1d3a55; stroke: #6cb9ff; } | |
| .barCatch { fill: #15402a; stroke: #7be3a4; } | |
| .barSteady { fill: #3a2614; stroke: #ff9f6c; } | |
| .barLabel { font: 11.5px -apple-system, sans-serif; fill: #e7ecf5; } | |
| .tagS { font: 11px ui-monospace, Menlo, monospace; fill: #9aa4bd; } | |
| </style> | |
| </defs> | |
| <!-- Tigerlake half --> | |
| <text class="title" x="20" y="28">Tigerlake — single snapshotter (serial)</text> | |
| <text class="sub" x="20" y="46">CDC thread runs throughout; only one table is actively being copied at any moment.</text> | |
| <line class="axis" x1="80" y1="240" x2="1180" y2="240"/> | |
| <g class="axisLabel"> | |
| <text x="80" y="258">t0</text> | |
| <text x="280" y="258">copy(T1)</text> | |
| <text x="480" y="258">copy(T2)</text> | |
| <text x="680" y="258">copy(T3)</text> | |
| <text x="880" y="258">copy(T4)</text> | |
| <text x="1100" y="258">→ t</text> | |
| </g> | |
| <!-- T1 row --> | |
| <text class="sub" x="20" y="74">T1</text> | |
| <rect class="barCopy" x="80" y="60" width="200" height="22" rx="3"/> | |
| <text class="barLabel" x="92" y="76">snapshot</text> | |
| <rect class="barCatch" x="280" y="60" width="20" height="22" rx="3"/> | |
| <rect class="barSteady" x="300" y="60" width="880" height="22" rx="3"/> | |
| <text class="barLabel" x="312" y="76">CDC on main</text> | |
| <!-- T2 row --> | |
| <text class="sub" x="20" y="104">T2</text> | |
| <rect class="barWait" x="80" y="90" width="200" height="22" rx="3"/> | |
| <text class="barLabel" x="92" y="106">wait (CDC discarded)</text> | |
| <rect class="barCopy" x="280" y="90" width="200" height="22" rx="3"/> | |
| <text class="barLabel" x="292" y="106">snapshot</text> | |
| <rect class="barCatch" x="480" y="90" width="20" height="22" rx="3"/> | |
| <rect class="barSteady" x="500" y="90" width="680" height="22" rx="3"/> | |
| <text class="barLabel" x="512" y="106">CDC on main</text> | |
| <!-- T3 row --> | |
| <text class="sub" x="20" y="134">T3</text> | |
| <rect class="barWait" x="80" y="120" width="400" height="22" rx="3"/> | |
| <text class="barLabel" x="92" y="136">wait (CDC discarded)</text> | |
| <rect class="barCopy" x="480" y="120" width="200" height="22" rx="3"/> | |
| <text class="barLabel" x="492" y="136">snapshot</text> | |
| <rect class="barCatch" x="680" y="120" width="20" height="22" rx="3"/> | |
| <rect class="barSteady" x="700" y="120" width="480" height="22" rx="3"/> | |
| <text class="barLabel" x="712" y="136">CDC on main</text> | |
| <!-- T4 row --> | |
| <text class="sub" x="20" y="164">T4</text> | |
| <rect class="barWait" x="80" y="150" width="600" height="22" rx="3"/> | |
| <text class="barLabel" x="92" y="166">wait (CDC discarded)</text> | |
| <rect class="barCopy" x="680" y="150" width="200" height="22" rx="3"/> | |
| <text class="barLabel" x="692" y="166">snapshot</text> | |
| <rect class="barCatch" x="880" y="150" width="20" height="22" rx="3"/> | |
| <rect class="barSteady" x="900" y="150" width="280" height="22" rx="3"/> | |
| <text class="barLabel" x="912" y="166">CDC on main</text> | |
| <!-- Snapshotter status --> | |
| <text class="sub" x="20" y="200">snap thread</text> | |
| <rect class="barCopy" x="80" y="186" width="800" height="22" rx="3"/> | |
| <text class="barLabel" x="92" y="202">busy: T1 → T2 → T3 → T4 (serial, single thread)</text> | |
| <!-- CDC thread status --> | |
| <text class="sub" x="20" y="226">cdc thread</text> | |
| <rect class="barSteady" x="80" y="212" width="1100" height="22" rx="3"/> | |
| <text class="barLabel" x="92" y="228">decoding+routing throughout (events for Needed tables decoded then DISCARDED)</text> | |
| <!-- live-sync half --> | |
| <text class="title" x="20" y="296">live-sync — worker pool (parallel up to tableSyncWorkers)</text> | |
| <text class="sub" x="20" y="314">All 4 workers can start concurrently; bounded by source bandwidth and target apply throughput.</text> | |
| <line class="axis" x1="80" y1="490" x2="1180" y2="490"/> | |
| <g class="axisLabel"> | |
| <text x="80" y="508">t0</text> | |
| <text x="600" y="508">all 4 sync in parallel</text> | |
| <text x="1100" y="508">→ t</text> | |
| </g> | |
| <text class="sub" x="20" y="344">T1</text> | |
| <rect class="barCopy" x="80" y="330" width="220" height="22" rx="3"/> | |
| <text class="barLabel" x="92" y="346">COPY (slot_T1, USE_SNAPSHOT)</text> | |
| <rect class="barCatch" x="300" y="330" width="60" height="22" rx="3"/> | |
| <text class="barLabel" x="305" y="346">catchup</text> | |
| <rect class="barSteady" x="360" y="330" width="820" height="22" rx="3"/> | |
| <text class="barLabel" x="372" y="346">leader applies (READY)</text> | |
| <text class="sub" x="20" y="374">T2</text> | |
| <rect class="barCopy" x="80" y="360" width="180" height="22" rx="3"/> | |
| <text class="barLabel" x="92" y="376">COPY (slot_T2)</text> | |
| <rect class="barCatch" x="260" y="360" width="60" height="22" rx="3"/> | |
| <rect class="barSteady" x="320" y="360" width="860" height="22" rx="3"/> | |
| <text class="barLabel" x="332" y="376">leader applies (READY)</text> | |
| <text class="sub" x="20" y="404">T3</text> | |
| <rect class="barCopy" x="80" y="390" width="260" height="22" rx="3"/> | |
| <text class="barLabel" x="92" y="406">COPY (slot_T3)</text> | |
| <rect class="barCatch" x="340" y="390" width="60" height="22" rx="3"/> | |
| <rect class="barSteady" x="400" y="390" width="780" height="22" rx="3"/> | |
| <text class="barLabel" x="412" y="406">leader applies (READY)</text> | |
| <text class="sub" x="20" y="434">T4</text> | |
| <rect class="barCopy" x="80" y="420" width="160" height="22" rx="3"/> | |
| <text class="barLabel" x="92" y="436">COPY (slot_T4)</text> | |
| <rect class="barCatch" x="240" y="420" width="60" height="22" rx="3"/> | |
| <rect class="barSteady" x="300" y="420" width="880" height="22" rx="3"/> | |
| <text class="barLabel" x="312" y="436">leader applies (READY)</text> | |
| <text class="sub" x="20" y="464">leader</text> | |
| <rect class="barSteady" x="80" y="450" width="1100" height="22" rx="3"/> | |
| <text class="barLabel" x="92" y="466">streaming main slot; skips DML for non-READY rels, takes them over as each goes SYNCDONE</text> | |
| </svg> | |
| <h2>Step-by-step: Tigerlake</h2> | |
| <ol> | |
| <li><b>Startup:</b> <code>SyncTableSnapshotter.snapshotSyncTables()</code> opens REPEATABLE READ, captures startup horizon <code>S0</code>, joins <code>sync_tables</code> with <code>snapshot</code>. All 4 rows have <code>snapshot_state IS NULL</code> → <code>tableAlreadySnapshotted() = false</code>.</li> | |
| <li><b>Registry seeded</b> with 4 <code>TigerlakeIcebergTable</code>s, each <code>status = Needed</code>, <code>postgresSnapshot = null</code>, <code>icebergTable = null</code>. Plus <code>TigerlakeSyncTables(S0)</code> and <code>HeartbeatTable</code>.</li> | |
| <li><b>Threads spawn:</b> replication / cdc / snapshotter / heartbeat. CDC thread immediately starts pulling from the queue; snapshotter calls <code>nextSnapshot()</code>.</li> | |
| <li><b>nextSnapshot iteration order</b> is <code>ConcurrentHashMap</code>-iteration (hash-bucket, effectively undefined) — <code>findFirst</code> of those with <code>requestSnapshot()</code> non-empty. <b>Whichever bucket comes first wins.</b> Call it T<sub>π(1)</sub>.</li> | |
| <li><b>Snapshot of T<sub>π(1)</sub>:</b> new RR tx, <code>PostgresSnapshot.read</code> gives horizon S<sub>π(1)</sub>, <code>initiateSnapshot</code> flips to <code>InSnapshot</code>, <code>createTable</code> drops+recreates Iceberg table, <code>SELECT *</code>, append files.</li> | |
| <li><b>During that snapshot, CDC for the other three:</b> | |
| <ul> | |
| <li>T<sub>π(1)</sub> CDC: <code>shouldDiscardEvent</code> uses S<sub>π(1)</sub> → discard pre-horizon, write post-horizon to T<sub>π(1)</sub>.pending-branch.</li> | |
| <li>T<sub>π(2..4)</sub> CDC: <code>postgresSnapshot == null</code> → <span class="tag warn">discarded</span>. LSN still advances per batch.</li> | |
| <li>CDC thread is doing real work (decode + lookup + filter) but throwing the data away.</li> | |
| </ul> | |
| </li> | |
| <li><b>T<sub>π(1)</sub> completes:</b> atomic Iceberg transaction appends bulk to <code>main</code>, replays pending branch, drops branch, expires snapshots. T<sub>π(1)</sub>.status = <code>SnapshotComplete</code>.</li> | |
| <li><b>Snapshotter loops</b> → picks the next <code>Needed</code> table. Repeats steps 5–7.</li> | |
| <li><b>After all 4 complete:</b> all four flow to <code>main</code>. Wall-clock total ≈ Σ snapshot times. Iceberg storage holds 4× (bulk + pending branch lifetimes), but pending branches were tiny because most of the CDC was discarded during the wait.</li> | |
| </ol> | |
| <h2>Step-by-step: live-sync</h2> | |
| <ol> | |
| <li><b>Startup:</b> <code>Subscriber.Sync</code> brings up the applier leader on the subscription's main slot, and <code>startTableSync</code> spawns <code>tableSyncWorkers</code> goroutines (each calling <code>copyTable</code> in a loop reading from <code>syncCh</code>).</li> | |
| <li><b>Feeder</b> calls <code>FetchForTableSync</code> → returns all 4 rels where <code>state NOT IN ('r','N')</code>, sends them on <code>syncCh</code>.</li> | |
| <li><b>Workers grab tables off the channel</b> — up to <code>tableSyncWorkers</code> in flight at once. With 4 workers + 4 tables → all 4 start simultaneously.</li> | |
| <li><b>Each worker independently:</b> | |
| <ol> | |
| <li>State <code>DATASYNC</code>.</li> | |
| <li>Pre-create chunks on target (if hypertable).</li> | |
| <li>Open RR tx on source connection; <code>CREATE_REPLICATION_SLOT slot_T<sub>i</sub> LOGICAL pgoutput USE_SNAPSHOT</code> → gives <code>consistent_point</code> = <code>startLSN<sub>i</sub></code>, binds tx to that snapshot atomically.</li> | |
| <li>Begin target tx; <code>OriginCreate(origin_T<sub>i</sub>)</code>; <code>OriginAdvance(startLSN<sub>i</sub>)</code>.</li> | |
| <li><code>COPY T<sub>i</sub> TO STDOUT</code> → <code>COPY T<sub>i</sub> FROM STDIN</code>. State <code>FINISHEDCOPY</code>.</li> | |
| <li><code>SendFinishedCopy</code> to leader (channel). Leader responds with <code>Catchup(currentLeaderLSN)</code>.</li> | |
| <li>Start replication on <code>slot_T<sub>i</sub></code> from <code>startLSN<sub>i</sub></code> → apply T<sub>i</sub> DML only → stop at <code>Catchup.LSN</code>.</li> | |
| <li>State <code>SYNCDONE</code>; <code>DROP slot_T<sub>i</sub></code>; <code>DROP origin_T<sub>i</sub></code>; <code>SendSyncDone</code>.</li> | |
| <li>Leader marks T<sub>i</sub> <code>READY</code> and starts applying T<sub>i</sub> DML on the main slot, skipping LSNs ≤ <code>SYNCDONE.LSN</code>.</li> | |
| </ol> | |
| </li> | |
| <li><b>Source PG load during sync:</b> 4 simultaneous <code>COPY TO STDOUT</code> readers + 4 logical slots holding WAL + 1 main slot. <span class="tag warn">Peak source-side pressure.</span></li> | |
| <li><b>Target PG load:</b> 4 simultaneous <code>COPY FROM STDIN</code> writers + leader applying skipped/passed DML.</li> | |
| <li><b>Wall-clock total</b> ≈ <code>max(individual sync times)</code> if <code>tableSyncWorkers ≥ N_tables</code>. If <code>tableSyncWorkers < N_tables</code>, the slowest table sets the lower bound but others queue.</li> | |
| </ol> | |
| <h2>Where time and resources go</h2> | |
| <table> | |
| <tr><th>Dimension</th><th>Tigerlake (4 fresh tables)</th><th>live-sync (4 fresh tables, 4 workers)</th></tr> | |
| <tr><td>Bulk-copy concurrency</td><td><span class="tag bad">1</span> (serial)</td><td><span class="tag ok">4</span> (parallel up to pool size)</td></tr> | |
| <tr><td>Wall-clock to all-READY/Complete</td><td>Σ snapshot times</td><td>max(snapshot times)</td></tr> | |
| <tr><td>Source PG replication slots</td><td>1 (always)</td><td>1 + 4 (peak; one per active worker)</td></tr> | |
| <tr><td>Source PG concurrent COPY readers</td><td>1</td><td>4</td></tr> | |
| <tr><td>Source PG WAL retention pressure</td><td>Bounded by single slot's confirmed_flush_lsn</td><td>Held by 4 tablesync slots until each catches up; can be large for long-running copies</td></tr> | |
| <tr><td>Target write concurrency</td><td>1 Iceberg writer (snapshotter) + 1 CDC writer (serial through queue)</td><td>4 simultaneous <code>COPY FROM STDIN</code> + leader's apply</td></tr> | |
| <tr><td>CDC events for not-yet-snapshotted tables</td><td>Decoded then <span class="tag bad">discarded</span>; recovered later via bulk SELECT</td><td>Held in per-table slot WAL; drained during catchup. No discard.</td></tr> | |
| <tr><td>Wasted decode/route work</td><td>3×duration of T<sub>π(1)</sub> + 2×T<sub>π(2)</sub> + 1×T<sub>π(3)</sub> worth of discarded CDC</td><td>None — every decoded message is applied somewhere</td></tr> | |
| <tr><td>Recovery on restart mid-copy</td><td>Per-table catalog row in <code>_timescaledb_lake_tables.snapshot</code>; mid-snapshot crash → table back to Needed, bulk redone</td><td>Per-table state machine; <code>FINISHEDCOPY</code> resumes catchup from origin; <code>DATASYNC</code> recopies (slot+origin recreated)</td></tr> | |
| <tr><td>Failure isolation between tables</td><td><span class="tag bad">Low</span> — one snapshotter, one CDC thread; any uncaught exception in a long-lived thread → whole connector exits via <code>Quarkus.asyncExit(1)</code></td><td><span class="tag ok">High</span> — per-table worker; one worker's failure is isolated (recorded in <code>last_error</code>); retry on next 30s tick</td></tr> | |
| <tr><td>Iceberg/target snapshot count</td><td>Per-table commit count: bulk + final merge + ongoing CDC commits (SnapshotCleaner expires excess)</td><td>n/a (target is PG)</td></tr> | |
| </table> | |
| <h2>What about restart with 4 already-complete tables?</h2> | |
| <div class="twocol"> | |
| <div class="card"> | |
| <h3>Tigerlake</h3> | |
| <ul> | |
| <li><code>SyncTableSnapshotter</code> finds <code>snapshot_state = FULL</code> for all 4 → <code>tableAlreadySnapshotted() = true</code>.</li> | |
| <li><code>fromRecordedSnapshotState</code> loads each existing Iceberg table; status set to <code>SnapshotComplete</code>, <code>postgresSnapshot = null</code>.</li> | |
| <li><code>shouldDiscardEvent</code> branch <code>(SnapshotComplete && postgresSnapshot == null) → false</code> → CDC flows directly to <code>main</code> for all 4.</li> | |
| <li>Snapshotter blocks forever on <code>snapshotReady.await()</code> — nothing to do.</li> | |
| <li>Restart is cheap; resumes from the slot's <code>confirmed_flush_lsn</code>.</li> | |
| </ul> | |
| </div> | |
| <div class="card"> | |
| <h3>live-sync</h3> | |
| <ul> | |
| <li>Catalog has 4 rows in <code>READY</code>. <code>FetchForTableSync</code> excludes them (<code>state NOT IN ('r','N')</code>) → returns empty.</li> | |
| <li>Tablesync workers idle on <code>syncCh</code> waiting (no work).</li> | |
| <li>Applier leader resumes from main slot's <code>confirmed_flush_lsn</code>; applies DML for all 4 directly.</li> | |
| <li>Symmetric to Tigerlake's restart — both are cheap.</li> | |
| </ul> | |
| </div> | |
| </div> | |
| <h2>Mixed case — 2 complete, 2 fresh</h2> | |
| <p>Behavior is the same as above, applied per-table:</p> | |
| <ul> | |
| <li><b>Tigerlake:</b> the 2 complete tables flow to <code>main</code> immediately; the 2 fresh tables are snapshotted serially, each going through the <code>Needed → InSnapshot → SnapshotComplete</code> dance with their own pending branch. CDC for the 2 fresh tables is dropped while they wait.</li> | |
| <li><b>live-sync:</b> the 2 complete tables (<code>READY</code>) ride the leader's main slot directly; the 2 fresh tables are picked up by workers in parallel (up to pool size). Their pre-sync CDC is held in their respective per-table slot WAL, never dropped.</li> | |
| </ul> | |
| <h2>The big picture</h2> | |
| <div class="verdict"> | |
| <p><b>Tigerlake's 4-table behavior is dominated by serialisation</b>: one snapshotter thread, one CDC thread, no per-table parallelism. The single-slot model means it can't easily do parallel bulk reads without coordinating multiple consistent points itself; today it sidesteps that by doing one table at a time and discarding the CDC for tables still in <code>Needed</code>.</p> | |
| <p><b>live-sync's 4-table behavior is dominated by parallelism</b>: every table gets its own consistent point via <code>USE_SNAPSHOT</code>, its own slot, its own worker, its own origin. The wall-clock saving is large; the cost is paid in source-side WAL retention (peak: 1 + N slots) and source/target IO concurrency.</p> | |
| <p>If your operational priority is <b>fast onboarding of many tables at once</b>, live-sync's design wins decisively. If your priority is <b>predictable, single-knob source pressure with cheap object-storage buffering of in-flight CDC</b> (the lakehouse case), Tigerlake's design is appropriate — but it would still benefit from running the snapshotter in a small pool keyed by table, and from grabbing each table's horizon via <code>USE_SNAPSHOT</code> on a temp slot so the seam is a clean LSN comparison rather than the current txid filter.</p> | |
| </div> | |
| </main> | |
| </body> | |
| </html> |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| <!doctype html> | |
| <html lang="en"> | |
| <head> | |
| <meta charset="utf-8" /> | |
| <title>Tigerlake PR #325 vs live-sync — snapshot/CDC handoff compared</title> | |
| <style> | |
| :root { | |
| --bg: #0f1320; | |
| --panel: #161c2e; | |
| --panel-2: #1d2540; | |
| --accent: #6cb9ff; | |
| --accent-2: #ff9f6c; | |
| --accent-3: #7be3a4; | |
| --accent-4: #f17ad6; | |
| --warn: #ffd166; | |
| --text: #e7ecf5; | |
| --muted: #9aa4bd; | |
| --line: #2b3656; | |
| } | |
| html, body { background: var(--bg); color: var(--text); font: 14px/1.5 -apple-system, "SF Pro Text", "Inter", sans-serif; margin: 0; } | |
| main { max-width: 1240px; margin: 28px auto; padding: 0 20px 64px; } | |
| h1 { font-size: 22px; margin: 0 0 6px; } | |
| h2 { font-size: 17px; margin: 28px 0 8px; color: var(--accent); letter-spacing: .2px; } | |
| h3 { font-size: 14px; margin: 14px 0 6px; color: var(--accent-2); } | |
| p { margin: 6px 0; } | |
| code, .mono { font: 12.5px/1.4 ui-monospace, "JetBrains Mono", Menlo, Consolas, monospace; color: #d6e0ff; background: #0c1020; padding: 1px 5px; border-radius: 4px; border: 1px solid #1d2540; } | |
| .lede { color: var(--muted); margin: 0 0 8px; } | |
| .meta { color: var(--muted); font-size: 12.5px; } | |
| .grid { display: grid; grid-template-columns: 1fr 1fr; gap: 16px; } | |
| .card { background: var(--panel); border: 1px solid var(--line); border-radius: 10px; padding: 14px 16px; } | |
| .card h3:first-child { margin-top: 0; } | |
| .card.alt { background: var(--panel-2); } | |
| ul, ol { margin: 6px 0 0 18px; padding: 0; } | |
| li { margin: 4px 0; } | |
| svg { width: 100%; height: auto; background: linear-gradient(180deg, #0c1020, #10162a); border-radius: 12px; border: 1px solid var(--line); } | |
| table { border-collapse: collapse; width: 100%; margin-top: 8px; font-size: 13px; } | |
| th, td { border: 1px solid var(--line); padding: 7px 10px; text-align: left; vertical-align: top; } | |
| th { background: #0c1020; color: var(--muted); font-weight: 500; white-space: nowrap; } | |
| th.col1 { width: 220px; } | |
| td b { color: #d6e0ff; } | |
| .tag { display: inline-block; padding: 1px 7px; border-radius: 999px; font-size: 11.5px; border: 1px solid var(--line); background: #0c1020; color: #b3c2e0; } | |
| .tag.ok { color: var(--accent-3); border-color: #1c5a3a; background: #08231a; } | |
| .tag.warn { color: var(--warn); border-color: #5a4a1c; background: #1d1a08; } | |
| .tag.bad { color: #ff7e7e; border-color: #5a1c1c; background: #230808; } | |
| .verdict { background: #131a30; border: 1px solid #2b3656; border-left: 3px solid var(--accent-3); padding: 12px 14px; border-radius: 8px; } | |
| </style> | |
| </head> | |
| <body> | |
| <main> | |
| <h1>Tigerlake (PR #325) vs live-sync — how the snapshot ↔ CDC seam is sewn</h1> | |
| <p class="lede">Same problem (lossless initial copy + continuous CDC); different targets force different designs. Here's the side-by-side and where each one wins.</p> | |
| <p class="meta">Compares <code>timescaledb-tigerlake</code> PR #325 (Iceberg sink, Java/Quarkus) with <code>live-sync</code> (PG→PG/TimescaleDB, Go) — <code>pkg/subscription/{tablesync,applier}</code>.</p> | |
| <h2>Side-by-side architecture</h2> | |
| <svg viewBox="0 0 1240 540" role="img" aria-label="Tigerlake vs live-sync architecture comparison"> | |
| <defs> | |
| <marker id="a" viewBox="0 0 10 10" refX="9" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse"><path d="M0,0 L10,5 L0,10 z" fill="#9aa4bd"/></marker> | |
| <marker id="aB" viewBox="0 0 10 10" refX="9" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse"><path d="M0,0 L10,5 L0,10 z" fill="#6cb9ff"/></marker> | |
| <marker id="aG" viewBox="0 0 10 10" refX="9" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse"><path d="M0,0 L10,5 L0,10 z" fill="#7be3a4"/></marker> | |
| <marker id="aO" viewBox="0 0 10 10" refX="9" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse"><path d="M0,0 L10,5 L0,10 z" fill="#ff9f6c"/></marker> | |
| <style> | |
| .title { font: 600 13px -apple-system, sans-serif; fill: #e7ecf5; } | |
| .sub { font: 11.5px -apple-system, sans-serif; fill: #cfd9f5; } | |
| .sml { font: 11px ui-monospace, Menlo, monospace; fill: #9aa4bd; } | |
| .box { fill: #1d2540; stroke: #2b3656; } | |
| .boxA { fill: #102035; stroke: #6cb9ff; } | |
| .boxB { fill: #2a1c14; stroke: #ff9f6c; } | |
| .boxC { fill: #102a1c; stroke: #7be3a4; } | |
| .boxD { fill: #2a1429; stroke: #f17ad6; } | |
| .boxX { fill: #161c2e; stroke: #2b3656; stroke-dasharray: 4 3; } | |
| .e { stroke: #9aa4bd; stroke-width: 1.4; fill: none; } | |
| .eB { stroke: #6cb9ff; stroke-width: 1.5; fill: none; } | |
| .eG { stroke: #7be3a4; stroke-width: 1.5; fill: none; } | |
| .eO { stroke: #ff9f6c; stroke-width: 1.5; fill: none; } | |
| </style> | |
| </defs> | |
| <!-- ===== LEFT: Tigerlake ===== --> | |
| <g> | |
| <rect class="boxX" x="20" y="20" width="600" height="500" rx="10"/> | |
| <text class="title" x="34" y="42">Tigerlake (PR #325) — single slot, Iceberg branches</text> | |
| <!-- PG --> | |
| <rect class="boxB" x="40" y="60" width="160" height="440" rx="8"/> | |
| <text class="title" x="52" y="80">PostgreSQL</text> | |
| <rect class="box" x="56" y="96" width="128" height="58" rx="6"/> | |
| <text class="sub" x="64" y="116">1 logical slot</text> | |
| <text class="sml" x="64" y="134">timescaledb_lake_slot</text> | |
| <text class="sml" x="64" y="148">pub timescaledb_lake</text> | |
| <rect class="box" x="56" y="166" width="128" height="58" rx="6"/> | |
| <text class="sub" x="64" y="186">pg_current_snapshot</text> | |
| <text class="sml" x="64" y="202">xmin / xmax / xips</text> | |
| <text class="sml" x="64" y="216">REPEATABLE READ tx</text> | |
| <rect class="box" x="56" y="236" width="128" height="50" rx="6"/> | |
| <text class="sub" x="64" y="256">SELECT * (bulk)</text> | |
| <text class="sml" x="64" y="274">in snapshot tx</text> | |
| <rect class="box" x="56" y="298" width="128" height="50" rx="6"/> | |
| <text class="sub" x="64" y="318">sync_tables</text> | |
| <text class="sml" x="64" y="336">drives registration</text> | |
| <rect class="box" x="56" y="360" width="128" height="58" rx="6"/> | |
| <text class="sub" x="64" y="380">dbz_heartbeat</text> | |
| <text class="sml" x="64" y="398">keeps LSN moving</text> | |
| <text class="sml" x="64" y="412">when idle</text> | |
| <!-- Connector --> | |
| <rect class="boxA" x="220" y="60" width="200" height="440" rx="8"/> | |
| <text class="title" x="232" y="80">Quarkus connector</text> | |
| <rect class="box" x="234" y="96" width="172" height="80" rx="6"/> | |
| <text class="sub" x="244" y="116">ReplicationCDCReader</text> | |
| <text class="sml" x="244" y="134">pgoutput decode</text> | |
| <text class="sml" x="244" y="150">→ BlockingQueue</text> | |
| <text class="sml" x="244" y="166">single producer</text> | |
| <rect class="box" x="234" y="188" width="172" height="92" rx="6"/> | |
| <text class="sub" x="244" y="208">TigerlakeSink (1 thread)</text> | |
| <text class="sml" x="244" y="226">resolve table → registry</text> | |
| <text class="sml" x="244" y="242">writeCDC → branch</text> | |
| <text class="sml" x="244" y="258">commitCDC + advance LSN</text> | |
| <text class="sml" x="244" y="274">txidIsBefore filter</text> | |
| <rect class="boxC" x="234" y="292" width="172" height="78" rx="6"/> | |
| <text class="sub" x="244" y="312">Snapshotter (1 thread)</text> | |
| <text class="sml" x="244" y="330">bulk SELECT once per table</text> | |
| <text class="sml" x="244" y="346">writes via SnapshotWriter</text> | |
| <text class="sml" x="244" y="362">serial across tables</text> | |
| <rect class="box" x="234" y="382" width="172" height="74" rx="6"/> | |
| <text class="sub" x="244" y="402">TigerlakeRegistry</text> | |
| <text class="sml" x="244" y="420">name → TigerlakeTable</text> | |
| <text class="sml" x="244" y="436">snapshotPendingLock,</text> | |
| <text class="sml" x="244" y="452">icebergWriteLock</text> | |
| <!-- Iceberg --> | |
| <rect class="boxD" x="440" y="60" width="160" height="440" rx="8"/> | |
| <text class="title" x="452" y="80">Iceberg / S3</text> | |
| <rect class="box" x="456" y="96" width="128" height="100" rx="6"/> | |
| <text class="sub" x="466" y="116">Per table:</text> | |
| <text class="sml" x="466" y="134">branch main</text> | |
| <text class="sml" x="466" y="150"> ← bulk + steady CDC</text> | |
| <text class="sml" x="466" y="170">branch pending</text> | |
| <text class="sml" x="466" y="186"> ← CDC during snapshot</text> | |
| <rect class="box" x="456" y="208" width="128" height="78" rx="6"/> | |
| <text class="sub" x="466" y="228">Atomic merge</text> | |
| <text class="sml" x="466" y="246">newTransaction:</text> | |
| <text class="sml" x="466" y="262"> bulk → main,</text> | |
| <text class="sml" x="466" y="278"> replay pending → main</text> | |
| <rect class="box" x="456" y="298" width="128" height="58" rx="6"/> | |
| <text class="sub" x="466" y="318">commit ordering</text> | |
| <text class="sml" x="466" y="336">via icebergWriteLock</text> | |
| <rect class="box" x="456" y="368" width="128" height="100" rx="6"/> | |
| <text class="sub" x="466" y="388">SnapshotCleaner</text> | |
| <text class="sml" x="466" y="406">expire snapshots</text> | |
| <text class="sml" x="466" y="422">keep maxSnapshots</text> | |
| <!-- edges --> | |
| <path class="eO" d="M 184,124 L 234,124" marker-end="url(#aO)"/> | |
| <path class="eO" d="M 184,260 L 234,310" marker-end="url(#aO)"/> | |
| <path class="eB" d="M 406,224 L 456,160" marker-end="url(#aB)"/> | |
| <path class="eG" d="M 406,320 L 456,250" marker-end="url(#aG)"/> | |
| </g> | |
| <!-- ===== RIGHT: live-sync ===== --> | |
| <g transform="translate(0,0)"> | |
| <rect class="boxX" x="640" y="20" width="580" height="500" rx="10"/> | |
| <text class="title" x="654" y="42">live-sync — per-table slot + LSN watermark (PG-native)</text> | |
| <!-- PG source --> | |
| <rect class="boxB" x="660" y="60" width="160" height="440" rx="8"/> | |
| <text class="title" x="672" y="80">Source PG</text> | |
| <rect class="box" x="676" y="96" width="128" height="70" rx="6"/> | |
| <text class="sub" x="684" y="116">Leader slot</text> | |
| <text class="sml" x="684" y="134">main pgoutput stream</text> | |
| <text class="sml" x="684" y="150">drives applier leader</text> | |
| <rect class="box" x="676" y="178" width="128" height="86" rx="6"/> | |
| <text class="sub" x="684" y="198">Tablesync slot</text> | |
| <text class="sml" x="684" y="216">CREATE_REPLICATION_SLOT</text> | |
| <text class="sml" x="684" y="232">USE_SNAPSHOT</text> | |
| <text class="sml" x="684" y="248">one per table</text> | |
| <text class="sml" x="684" y="262">→ consistent point LSN</text> | |
| <rect class="box" x="676" y="276" width="128" height="58" rx="6"/> | |
| <text class="sub" x="684" y="296">COPY TO STDOUT</text> | |
| <text class="sml" x="684" y="314">same REPEATABLE READ</text> | |
| <text class="sml" x="684" y="328">tx as the slot</text> | |
| <rect class="box" x="676" y="346" width="128" height="78" rx="6"/> | |
| <text class="sub" x="684" y="366">WAL retention</text> | |
| <text class="sml" x="684" y="384">N slots hold WAL</text> | |
| <text class="sml" x="684" y="400">until each catches up</text> | |
| <!-- Connector --> | |
| <rect class="boxA" x="840" y="60" width="200" height="440" rx="8"/> | |
| <text class="title" x="852" y="80">Connector (Go)</text> | |
| <rect class="box" x="854" y="96" width="172" height="92" rx="6"/> | |
| <text class="sub" x="864" y="116">Applier leader (1)</text> | |
| <text class="sml" x="864" y="134">decodes pgoutput</text> | |
| <text class="sml" x="864" y="150">applies SQL on target</text> | |
| <text class="sml" x="864" y="166">skips DML for non-READY</text> | |
| <text class="sml" x="864" y="182">relations</text> | |
| <rect class="boxC" x="854" y="200" width="172" height="102" rx="6"/> | |
| <text class="sub" x="864" y="220">TableSync workers (N)</text> | |
| <text class="sml" x="864" y="238">COPY → catchup → SYNCDONE</text> | |
| <text class="sml" x="864" y="254">own slot, own connection</text> | |
| <text class="sml" x="864" y="270">parallel across tables</text> | |
| <text class="sml" x="864" y="288">re-uses applier for catchup</text> | |
| <rect class="box" x="854" y="314" width="172" height="98" rx="6"/> | |
| <text class="sub" x="864" y="334">SyncState channels</text> | |
| <text class="sml" x="864" y="352">FinishedCopy →</text> | |
| <text class="sml" x="864" y="368">Catchup(LSN) ←</text> | |
| <text class="sml" x="864" y="384">SyncDone →</text> | |
| <text class="sml" x="864" y="400">leader picks up rel</text> | |
| <rect class="box" x="854" y="422" width="172" height="64" rx="6"/> | |
| <text class="sub" x="864" y="442">Catalog state machine</text> | |
| <text class="sml" x="864" y="460">DATASYNC → FINISHEDCOPY</text> | |
| <text class="sml" x="864" y="476">→ SYNCDONE → READY</text> | |
| <!-- Target --> | |
| <rect class="boxD" x="1060" y="60" width="140" height="440" rx="8"/> | |
| <text class="title" x="1072" y="80">Target PG/TS</text> | |
| <rect class="box" x="1076" y="96" width="108" height="80" rx="6"/> | |
| <text class="sub" x="1084" y="116">User table</text> | |
| <text class="sml" x="1084" y="134">COPY FROM</text> | |
| <text class="sml" x="1084" y="150">STDIN</text> | |
| <text class="sml" x="1084" y="166">(mutable)</text> | |
| <rect class="box" x="1076" y="188" width="108" height="78" rx="6"/> | |
| <text class="sub" x="1084" y="208">_ts_live_sync</text> | |
| <text class="sml" x="1084" y="226">catalog: per-rel</text> | |
| <text class="sml" x="1084" y="242">state + LSN</text> | |
| <rect class="box" x="1076" y="278" width="108" height="78" rx="6"/> | |
| <text class="sub" x="1084" y="298">Replication</text> | |
| <text class="sml" x="1084" y="316">origin per rel</text> | |
| <text class="sml" x="1084" y="332">tracks LSN</text> | |
| <rect class="box" x="1076" y="368" width="108" height="78" rx="6"/> | |
| <text class="sub" x="1084" y="388">Trigger model</text> | |
| <text class="sml" x="1084" y="406">ALWAYS triggers</text> | |
| <text class="sml" x="1084" y="422">session_replication</text> | |
| <text class="sml" x="1084" y="438">_role=replica</text> | |
| <!-- edges --> | |
| <path class="eO" d="M 804,130 L 854,140" marker-end="url(#aO)"/> | |
| <path class="eO" d="M 804,220 L 854,250" marker-end="url(#aO)"/> | |
| <path class="eO" d="M 804,300 L 1076,130" marker-end="url(#aO)"/> | |
| <path class="eB" d="M 1026,150 L 1076,150" marker-end="url(#aB)"/> | |
| <path class="eG" d="M 1026,260 L 1076,225" marker-end="url(#aG)"/> | |
| </g> | |
| </svg> | |
| <h2>The same problem, two different primitives</h2> | |
| <div class="grid"> | |
| <div class="card"> | |
| <h3>Tigerlake: txid filter + Iceberg branches</h3> | |
| <ol> | |
| <li>Snapshotter captures <code>pg_current_snapshot()</code> in REPEATABLE READ → <code>{xmin, xmax, xips, epoch}</code>.</li> | |
| <li>Bulk <code>SELECT *</code> under same tx → snapshot files appended (no commit yet).</li> | |
| <li>CDC events arriving in parallel: <code>txidIsBefore</code> drops everything the bulk already saw; the rest goes to <code>snapshot-pending-branch</code>.</li> | |
| <li>One Iceberg transaction appends bulk to <code>main</code>, replays pending branch onto <code>main</code>, drops branch.</li> | |
| </ol> | |
| <p class="meta">Watermark unit: <b>transaction id</b>. Buffering medium: <b>Iceberg branch</b>.</p> | |
| </div> | |
| <div class="card"> | |
| <h3>live-sync: USE_SNAPSHOT + LSN watermark</h3> | |
| <ol> | |
| <li>Tablesync worker opens REPEATABLE READ tx, then <code>CREATE_REPLICATION_SLOT … USE_SNAPSHOT</code> — slot's <em>consistent point LSN</em> and tx snapshot are atomically aligned by PG.</li> | |
| <li>Same tx does <code>COPY TO STDOUT</code> → <code>COPY FROM STDIN</code> on the target. State <code>DATASYNC → FINISHEDCOPY</code>.</li> | |
| <li>Worker signals applier leader; leader returns its current LSN as <code>CatchupMsg</code>.</li> | |
| <li>Worker starts replication on its own slot from <code>consistentPointLSN</code> up to <code>catchupLSN</code>, applying only DML for this relation. State <code>FINISHEDCOPY → SYNCDONE</code>.</li> | |
| <li>Leader sees <code>SYNCDONE</code>, takes over that relation on the main stream from <code>SYNCDONE.lsn</code>. State <code>SYNCDONE → READY</code>.</li> | |
| </ol> | |
| <p class="meta">Watermark unit: <b>LSN</b>. Buffering medium: <b>WAL on the source (per-table slot)</b>.</p> | |
| </div> | |
| </div> | |
| <h2>Comparison matrix</h2> | |
| <table> | |
| <tr><th class="col1">Dimension</th><th>Tigerlake (PR #325)</th><th>live-sync</th></tr> | |
| <tr><td>Source slots</td><td>1 (single global)</td><td>1 leader + 1 per-table tablesync (dropped on READY)</td></tr> | |
| <tr><td>Consistency primitive</td><td>Custom <code>txidIsBefore(xmin,xmax,xips,epoch)</code></td><td>PG-native <code>USE_SNAPSHOT</code> at slot creation; LSN watermark thereafter</td></tr> | |
| <tr><td>Snapshot ↔ stream join</td><td>Iceberg <code>snapshot-pending-branch</code> buffers in-flight CDC, atomically merged to <code>main</code> on completion</td><td>Per-table slot replays WAL from consistent point to leader's current LSN; then leader takes over</td></tr> | |
| <tr><td>Where in-flight CDC sits</td><td>Iceberg pending branch (object storage; cheap)</td><td>WAL on source (slot retention; expensive if many large tables)</td></tr> | |
| <tr><td>Initial-copy parallelism</td><td>Serial — one snapshotter thread</td><td>Parallel — N tablesync workers, configurable concurrency</td></tr> | |
| <tr><td>CDC apply parallelism</td><td>Serial — one CDC thread drains the queue</td><td>Serial leader for steady state; parallel during catchup</td></tr> | |
| <tr><td>Pre-snapshot CDC</td><td><b>Discarded</b> by the table's filter (bulk SELECT covers them later)</td><td>Leader <b>skips DML</b> for relations whose state ≠ READY; rows arrive via tablesync slot</td></tr> | |
| <tr><td>Restartability</td><td>Catalog row in <code>_timescaledb_lake_tables.snapshot</code>; Iceberg branch survives. Mid-snapshot crash → re-snap from scratch</td><td>Catalog state machine + slot survives. <code>FINISHEDCOPY</code> resumes catchup from origin's LSN; <code>DATASYNC</code> recopies</td></tr> | |
| <tr><td>WAL pressure on source</td><td>Low — single slot, heartbeat keeps it advancing</td><td>Can spike — every tablesync slot pins WAL until it reaches READY</td></tr> | |
| <tr><td>Target write cost</td><td>File append + occasional Iceberg commit (batched)</td><td>Per-row SQL <code>INSERT/UPDATE/DELETE</code> through pgx (mutable target)</td></tr> | |
| <tr><td>Target mutability</td><td>Append/equality-delete on Iceberg; eventually-consistent via row-keyed deletes</td><td>Native MVCC; reads always see consistent target state</td></tr> | |
| <tr><td>TOAST</td><td>Requires <code>REPLICA IDENTITY FULL</code>; <code>'u'</code> values throw</td><td>Standard pgoutput handling; full row when needed</td></tr> | |
| <tr><td>DDL during sync</td><td>Not yet handled in PR #325 (logical-decoding messages dropped)</td><td>Catalog tracks columns; schema changes propagated</td></tr> | |
| <tr><td>Code size of bridge</td><td>Small — ~400 LOC reader + ~150 LOC snapshot task + ~250 LOC table state</td><td>Larger — ~1000 LOC tablesync + ~1600 LOC applier + state machine + catalog migrations</td></tr> | |
| <tr><td>Operational ergonomics</td><td>1 slot to monitor; Iceberg snapshot history is the audit trail</td><td>N slots to monitor; catalog table is the audit trail; needs slot/origin cleanup logic</td></tr> | |
| <tr><td>Failure blast radius</td><td>One thread dies → whole connector restarts (intentional, via <code>Quarkus.asyncExit</code>)</td><td>Failures isolated per worker; leader survives single-table failures</td></tr> | |
| </table> | |
| <h2>Which is better?</h2> | |
| <p>They're optimizing for different targets, so "better" depends on what you're sinking into.</p> | |
| <div class="grid"> | |
| <div class="card"> | |
| <h3>Tigerlake's design wins when…</h3> | |
| <ul> | |
| <li><b>Target is a lakehouse</b>. Iceberg branches are <em>the</em> natural way to stage in-flight writes; you get an atomic ref-swap for free.</li> | |
| <li><b>Source WAL retention is tight</b>. One slot, one watermark — easy to monitor and protect with a heartbeat.</li> | |
| <li><b>Iceberg commits are expensive</b>, so you want few large, batched commits — exactly what the single CDC consumer gives you.</li> | |
| <li><b>You don't need to query during snapshot</b> — the pending branch keeps <code>main</code> stable but invisible. Iceberg query engines see "the old world" until the merge lands.</li> | |
| </ul> | |
| </div> | |
| <div class="card"> | |
| <h3>live-sync's design wins when…</h3> | |
| <ul> | |
| <li><b>Target is mutable (PG/TimescaleDB)</b>. Direct SQL apply with origin tracking gives strong, queryable consistency throughout.</li> | |
| <li><b>Per-table parallelism matters</b> — many independent tables, want to copy them concurrently with isolated failure domains.</li> | |
| <li><b>You can spend source WAL</b> to avoid carrying a custom txid filter. <code>USE_SNAPSHOT</code> hands you a free, exact, LSN-aligned consistency point — no <code>xmin/xmax/xips</code> arithmetic and no epoch wraparound to worry about.</li> | |
| <li><b>Standard pgoutput semantics (TOAST, DDL, types)</b> are needed and you don't want to reimplement them.</li> | |
| </ul> | |
| </div> | |
| </div> | |
| <h2>Honest tradeoffs (the parts each design pays for)</h2> | |
| <div class="grid"> | |
| <div class="card"> | |
| <h3>What PR #325 trades away</h3> | |
| <ul> | |
| <li><b>Hand-rolled txid filter</b>. <code>PostgresSnapshot.txidIsBefore</code> reimplements PG's MVCC visibility — including epoch arithmetic. Subtle and not free to maintain. <code>USE_SNAPSHOT</code> would avoid this entirely, but Iceberg doesn't need LSN alignment as strongly as a PG target does.</li> | |
| <li><b>Single CDC consumer</b> is a throughput ceiling. Once the queue fills, all tables stall together.</li> | |
| <li><b>Serial snapshotter</b>. One big table blocks every other pending snapshot.</li> | |
| <li><b>Pre-snapshot CDC is silently discarded</b> for new tables. Correct because the bulk read recovers it, but the invariant — "discarded only when bulk SELECT will see it" — depends on the right txid filter being installed before any non-discardable event arrives. Subtle.</li> | |
| <li><b>Iceberg-table-create race</b> (<code>waitUntilTigerlakeTableIsAvailable</code> spin) — PR notes this as known.</li> | |
| <li><b>DDL not handled</b>; <code>'M'</code> logical-decoding messages dropped.</li> | |
| </ul> | |
| </div> | |
| <div class="card"> | |
| <h3>What live-sync trades away</h3> | |
| <ul> | |
| <li><b>WAL retention per table</b>. A stuck or slow tablesync worker can pin many GB of WAL on the source until the slot is dropped. Operational risk grows with table count.</li> | |
| <li><b>More moving parts</b>. Two replication slots per table during sync, an origin, a per-table catalog row, three channel types coordinating leader/worker — more states, more races to reason about.</li> | |
| <li><b>Target schema overhead</b>. Needs <code>_ts_live_sync</code> schema, ALWAYS triggers, migrations. Not zero-touch on the target.</li> | |
| <li><b>SQL apply is row-by-row through pgx</b> — slower per record than a parquet file append, even with batching.</li> | |
| <li><b>Permissions on source</b>. Needs to create/drop slots — not always allowed by managed providers.</li> | |
| </ul> | |
| </div> | |
| </div> | |
| <h2>My read</h2> | |
| <div class="verdict"> | |
| <p>Neither is universally "better" — they're targeting fundamentally different sinks. <b>For a lakehouse sink, Tigerlake's design is the right shape</b>: one slot, branch-as-buffer, batched commits. The Iceberg branch primitive is genuinely cleaner than holding WAL on the source, and the atomic merge is easier to reason about than a per-table catchup state machine.</p> | |
| <p>But the <em>internals</em> would be sturdier if Tigerlake adopted live-sync's <b><code>USE_SNAPSHOT</code> trick</b> instead of the custom <code>txidIsBefore</code> filter: open the replication slot with <code>USE_SNAPSHOT</code> (or use <code>pg_export_snapshot()</code> from inside the slot's bootstrap), do the bulk <code>SELECT</code> against that exported snapshot, and the seam becomes a pure LSN comparison — no MVCC arithmetic, no epoch handling, no race between "snapshot installed" and "first CDC event past the horizon." That's the one durable lesson from live-sync that PR #325 could lift wholesale without changing its branching strategy.</p> | |
| <p>The other live-sync idea worth borrowing — <b>per-table parallelism for the initial snapshot</b> — would help with large user catalogs. Today the single snapshotter thread is the bottleneck. Multiple snapshotter threads each writing to their own table's <code>snapshot-pending-branch</code> would parallelise cleanly because the Iceberg writes are already keyed by table.</p> | |
| </div> | |
| </main> | |
| </body> | |
| </html> |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment