Skip to content

Instantly share code, notes, and snippets.

@natew
Created June 3, 2026 04:59
Show Gist options
  • Select an option

  • Save natew/36be3a643339d055fadcdcea67da47c4 to your computer and use it in GitHub Desktop.

Select an option

Save natew/36be3a643339d055fadcdcea67da47c4 to your computer and use it in GitHub Desktop.
orez.md

orez architecture: three runtimes for one sync engine

A technical deep-dive into the Zero/Postgres backend stack — orez-node, orez-web, and orez-cloudflare — and how the work is split between the standalone orez package and this repo (soot).

Related docs: docs/architecture.md (the canonical in-browser sync flow), docs/cloudflare-do-deploy.md (the CF DO deploy operational reference), docs/zero.md, docs/staging-cf-do.md. This doc is the why and the how it was built; those are the how to operate it.


0. TL;DR

Zero (Rocicorp's sync engine) wants two things from its environment that are normally heavyweight native dependencies:

  1. a real Postgres with logical replication (the upstream / source of truth), and
  2. a native better-sqlite3 addon for its local replica + CVR + CDC stores.

orez's entire thesis is: give zero-cache neither, but impersonate both so convincingly that an unmodified @rocicorp/zero runs on top. We never fork zero-cache. We fake the Postgres wire protocol, fake logical replication, and swap the storage engine underneath it. That one trick is then re-targeted at three radically different runtimes:

Runtime Where it runs Postgres engine SQLite (replica) engine Status
orez-node a Node/Bun child process (bunx orez) PGlite (Postgres-in-WASM) bedrock-sqlite (WASM) or native local dev
orez-web browser Web Workers, no server PGlite in a worker bedrock-sqlite (WASM, in-memory) SootBean in-IDE dev
orez-cloudflare a Cloudflare Durable Object DO SQLite via a PG→SQLite translator (no PGlite) the DO's own ctx.storage.sql production deploys

The three share a foundation (wire-protocol proxy, faked logical replication, change-tracking triggers, pgoutput encoder) and diverge sharply in how they host zero-cache and where bytes land.

Difficulty / risk at a glance (detailed scorecard in §7):

                 effort        difficulty     maint. risk    completeness
  orez-node      ~4 months     hard           medium-high    dev-grade, battle-tested
  orez-web       ~10 weeks     harder         high           works e2e, no durable replica
  orez-cloudflare ~2 wk burst  hardest        highest        the prod path, runtime-validated

1. Where the code lives — orez package vs. soot integration

This is the part that's easy to get wrong. The orez npm package (~/orez, 545 commits, 2026-02-08 → 2026-05-31, currently orez@0.3.5) ships the reusable primitives. soot (~/soot) ships the productized integration — and a large share of the genuinely hard, interesting engineering for web and cloudflare actually lives here, not in the package.

┌─────────────────────────────── orez package (~/orez) ──────────────────────────────┐
│ PRIMITIVES — reusable, runtime-agnostic                                              │
│                                                                                      │
│  pg-proxy.ts ................. Postgres wire-protocol proxy (the "fake Postgres")    │
│  replication/ ................ faked logical replication + pgoutput binary encoder   │
│  change-tracking.ts .......... AFTER-ROW triggers → _orez.changes change log         │
│  pglite-manager.ts ........... 3-instance PGlite lifecycle, memory tuning, recovery  │
│  sqlite-mode/ ................ swap @rocicorp/zero-sqlite3 → bedrock-sqlite (WASM)    │
│  pg-sqlite-compiler/ ......... pure-TS PostgreSQL→SQLite SQL translator              │
│  worker/zero-cache-embed.ts .. run zero-cache in-process (node)                      │
│  worker/browser-embed.ts ..... run zero-cache in-process (browser worker)            │
│  worker/zero-cache-embed-cf.ts run zero-cache in-process (Durable Object)            │
│  worker/cf-patches.ts ........ workerd-safe zero-cache overlay (no node_modules edit)│
│  worker/shims/ ............... node:* builtins, postgres-socket, ws-browser, …       │
│  pg-proxy-browser.ts ......... wire-protocol proxy over a MessagePort                │
│  pg-proxy-do-backend.ts ...... DoBackend: PG wire → DO SQL (the ~7.7k-line translator)│
│  cf-do/worker.ts ............. ZeroDO / ZeroSqlDO — the DO SQL backend               │
│  cf-do/watermark.ts .......... monotonic change-feed cursor over DO SQLite           │
│  do-sql-tracking.ts .......... transaction barrier (_zero_pending_changes)           │
│  s3-local.ts, recovery.ts, zero-litestream-patch.ts                                  │
└──────────────────────────────────────────────────────────────────────────────────┘
                          ▲ imported by ▲                  ▲ imported by ▲
┌──────────────────────── soot (~/soot) — INTEGRATION + PRODUCTIZATION ───────────────┐
│                                                                                      │
│  packages/orez-web/ (4,228 LOC) — the in-browser orchestrator                        │
│    src/index.ts ............. startOrezWeb(): spawns + wires the worker mesh          │
│    src/workers/*.worker.ts .. 4 worker entrypoints (zc / pg-proxy / pglite / proxy)  │
│    src/workers/sab-channel.ts SharedArrayBuffer ring transport (the fast path)       │
│    scripts/build-zero-cache.ts build-time esbuild SURGERY on zero's compiled output  │
│    src/shims/bedrock-sqlite-stub.ts · postgres-browser.ts · lib/browser-mode.ts      │
│                                                                                      │
│  src/deploy/cloudflareDoDeploy.ts (1,398 LOC) — the entire CF DO deploy path         │
│    ZeroCacheDO class · routing shim source · the build/bundler · migration runner    │
│    ensurePublication · resetReplicaIfTableSetChanged · the two split-brain fixes     │
│                                                                                      │
│  src/worker/projectRuntime.ts (34 KB) · orez-web-client.ts · project-server.worker.ts│
│  scripts/build-orez.ts ........ build → public/orez-web-*.worker.js                   │
│  scripts/dev/{validate,test,monitor}-cf-do-*.ts — the runtime validators            │
│  src/tiling/components/{ArchViz,OrezChanges,ZeroDebug}Pane.tsx — live in-IDE viz     │
│  src/production/{deployTier,orezLogUrl,HealthPane}.ts — prod monitoring              │
│  packages/orez-sprites/ ....... LEGACY fly-sprite deploy tier (replaced by CF DO)    │
└──────────────────────────────────────────────────────────────────────────────────┘

Rule of thumb: the package knows how to be a fake Postgres + run zero-cache off-Node; soot knows how to deploy and operate that for real projects — the worker mesh per project, the CF deploy pipeline, the self-healing migrations, the validators, and the visualizations.


2. The shared foundation (the trick, once)

Every runtime reuses these four primitives from the orez package. Understand them once and all three variants make sense.

2.1 The Postgres wire-protocol proxy

pg-proxy.ts (node) and pg-proxy-browser.ts (browser) are a partial, from-scratch Postgres server whose only job is to lie convincingly to one specific client: zero-cache. They hand-parse and rebuild raw PG protocol bytes and rewrite the queries zero-cache issues that PGlite can't satisfy:

  • version() → a clean PostgreSQL 17.4 string (emscripten's breaks pg_restore)
  • current_setting('wal_level')'logical'
  • strip READ ONLY / ISOLATION LEVEL / SET TRANSACTION (meaningless on a single session)
  • pg_replication_slots → a fake _orez table; pg_drop_replication_slot() / pg_terminate_backend() → harmless SQL
  • synthesize responses for ping queries (SELECT 1) and no-op SETs without touching the mutex
  • dedupe identical information_schema/catalog queries from multiple sync workers (30s TTL + in-flight coalescing)

A per-instance async mutex serializes everything else, because PGlite is single-session (pg-proxy.ts:152-201, 313-340, 887-960).

2.2 Faked logical replication

PGlite has no logical replication, so orez invents the entire pgoutput pipeline:

  1. change-tracking triggers (change-tracking.ts): AFTER INSERT/UPDATE/DELETE triggers capture every mutation as JSONB into the _orez.changes change log.
  2. a pgoutput binary encoder (replication/pgoutput-encoder.ts): emits BEGIN/RELATION/INSERT/UPDATE/DELETE/COMMIT/keepalive frames in the exact wire framing zero-cache's change-streamer expects.
  3. a replication handler (replication/handler.ts): fakes IDENTIFY_SYSTEM / CREATE_REPLICATION_SLOT / START_REPLICATION, synthesizes LSNs from wall-clock time (so a restarted proxy never emits an LSN behind zero-cache's persisted watermark), honors the client's resume LSN on reconnect, and streams the encoded changes.

This is the single highest-churn area of the whole project (handler.ts: 65 commits) — every reconnect/restart edge case was a separate hard-won fix.

Doc correction: the README still claims "20ms/500ms adaptive polling." The handler is now event-driven (signalReplicationChange wakeup + a pg_notify LISTEN fallback, 5000ms idle keepalive, ~30s safety poll). Commit 9d2b182 dropped idle CPU from 75-100% to near-zero.

2.3 Running zero-cache in-process (the embeds)

Zero normally fork()s 5 worker processes (main / change-streamer / reaper / replicator / syncer). orez forces SINGLE_PROCESS=1 and feeds zero's runWorker() a fake parent: a node:events EventEmitter wrapped in a Proxy that adds zero's onMessageType/onceMessageType IPC surface, with process.exit intercepted into an emit. Zero's inter-process IPC collapses into in-process EventEmitter channels. The three embeds (zero-cache-embed.ts, worker/browser-embed.ts, worker/zero-cache-embed-cf.ts) are the same idea targeted at Node, a Web Worker, and a Durable Object.

2.4 Swapping the SQLite engine

zero-cache hard-requires @rocicorp/zero-sqlite3 (a native C addon). On WASM targets, orez points it at bedrock-sqlite — SQLite's bedrock branch compiled to WASM with WAL2 + BEGIN CONCURRENT (which zero-cache requires) — and polyfills the better-sqlite3 surface. In node this is an on-disk rewrite of the package's index.js (sqlite-mode/apply-mode.ts); in the browser/DO it's a build-time alias. This is the riskiest dependency-coupling surface in the package.


3. orez-node — the foundation

bunx orez: run Zero locally with no native deps — no Postgres install, no SQLite compile, no Docker. This is the original target and the proving ground for everything above. All of it lives in the orez package.

Architecture

   bunx orez  (cli-entry.ts → cli.ts → index.ts:startZeroLite)
        │
   ┌────┴────────────┐     ┌──────────────────────────────┐    ┌────────────────────────┐
   │ PGlite ×3       │     │  pg-proxy.ts  (TCP :6434)      │    │ zero-cache CHILD PROCESS │
   │  postgres ◄─────┼─────┤  speaks Postgres WIRE PROTOCOL │◄───┤ real @rocicorp/zero     │
   │  zero_cvr ◄─────┼─────┤  pg-gateway for normal conns   │ PG │ (spawned node)          │
   │  zero_cdb ◄─────┼─────┤  raw socket for replication    │wire│  ZERO_UPSTREAM_DB=:6434 │
   │ (3 data dirs,   │     │  per-instance Mutex            │    │  ZERO_REPLICA_FILE=*.db │
   │  3 mutexes)     │     │  query rewrites + schema cache │    └───────────┬─────────────┘
   └────┬────────────┘     └──────────────┬─────────────────┘                │ reads/writes
        │ change triggers                 │ START_REPLICATION                 ▼ replica via
        ▼                                 ▼                       ┌──────────────────────────┐
   ┌─────────────────┐   ┌──────────────────────────────┐        │ bedrock-sqlite (WASM) or  │
   │ _orez.changes   │──►│ replication/handler.ts         │        │ native @rocicorp/zero-    │
   │ + pg_notify     │   │  fakes IDENTIFY/SLOT/START_REPL│        │ sqlite3 (runtime swap)    │
   └─────────────────┘   │  encodes pgoutput, streams it  │        └──────────────────────────┘
                         └──────────────────────────────┘

Why three PGlite instances: zero-cache wants three databases with independent transaction contexts. PGlite is single-session; sharing one corrupts CVR transactions with ConcurrentModificationException. So orez runs three separate PGlite data dirs, each with its own mutex; the main one gets dev-tuned memory + 12 lazy-loaded contrib extensions (pgvector, pg_trgm, …), cvr/cdb get minimal fixed memory (pglite-manager.ts:118-167, 265-278).

Two IPC layers: zero-cache is a real child process over a TCP socket + PG wire (with a --require preload that polls the parent PID and self-destructs on parent death — macOS has no PR_SET_PDEATHSIG, and orphaned sync-workers busy-loop at 100% CPU). PGlite optionally runs in worker threads so synchronous WASM execution doesn't block the proxy/replication event loop; wire-protocol ArrayBuffers are transferred, not copied (pglite-ipc.ts).

Interesting / hard pieces

  • The wire-protocol proxy (§2.1) — a from-scratch partial Postgres server.
  • Faked logical replication (§2.2) — the pgoutput pipeline.
  • bedrock-sqlite + the on-disk shim swap (§2.4), with backup/restore lifecycle to prevent WASM↔native mode contamination.
  • The pg→sqlite compiler (pg-sqlite-compiler/, ~2.1k LOC): a pure-TS PostgreSQL→SQLite translator (single-pass visitor over the libpg_query WASM AST), built so the translation can later be bundled into a Durable Object (pgsqlite is Rust/tokio and won't run in workerd). Oracle-tested against the real pgsqlite binary (910 fixtures) with a corpus ratchet. Promising but incomplete — dev-oracle-gated, not load-bearing for the node path yet.
  • Memory auto-sizing & WAL purge: re-execs node at ~50% RAM, periodic CHECKPOINT to bound pg_wal/ growth, consumed change-log rows purged after streaming.
  • The litestream patch (zero-litestream-patch.ts): Zero 1.5's dedicated change-streamer unconditionally calls restoreReplica() expecting a litestream backup orez doesn't have. orez patches the compiled commands.js on disk to short-circuit it — and throws loudly if the upstream function signature changes. The cleanest example of the "patch zero internals surgically, fail loudly on drift" pattern.
  • s3-local.ts: a deliberately minimal GET/PUT/DELETE/HEAD file server with CORS so Zero apps needing uploads don't pull in MinIO/Docker.
  • Crash recovery: CDC-corruption detection drives recoverZeroState() which wipes+rebuilds CVR/CDB + replica as one consistency domain, with a budget of 5 resets per 5 minutes.

Effort & difficulty

Built over ~4 months (Feb–May 2026), heavily debug-driven (142 fix: vs 57 feat: vs 23 perf:). Churn concentration (a proxy for difficulty):

Subsystem Evidence Difficulty
Replication (handler/encoder/tracker) handler.ts 65 commits; "replication" 54, "watermark" 17 hardest — every reconnect/restart edge is a fix
WASM SQLite / SHM / WAL2 "sqlite" 50, "wasm" 33; two dedicated plans documenting a multi-week dead-end very hard, partially unresolved — cross-process WASM SHM was worked around (force 1 sync worker + native fallback), not solved
pg-proxy / wire protocol pg-proxy.ts 61 commits; "lock|mutex" 43 hard — single-session semantics on shared instances is a minefield
restore / recovery "restore" 27 (COPY→INSERT, dollar-quoting, oversized-row skip) medium-hard

Overall: high. Not a thin adapter — a reimplementation of the Postgres-facing contract zero-cache depends on, each piece bug-found through real app integration.

Maintenance risk — medium-high

The dominant risk is coupling to zero-cache internals: orez imports zero-cache's run-worker by a string-munged filesystem path, rewrites compiled zero files on disk (litestream patch, the sqlite3 shim), and matches CDC error strings by substring in recovery. Pinned at @rocicorp/zero@1.5.0 for a reason. On a Zero upgrade, expect to revisit: the run-worker path, the IPC contract, the litestream anchor (caught loudly), the SQLite pragma surface, and the replica/CVR/CDB schema framing.

Completeness

Explicitly dev-only (README: "Not suitable for production" — single-session mutex, trigger overhead, no HA). Within that scope, the proxy, triggers, pgoutput encoder, restore, and crash recovery are production-quality and battle-tested against real apps (chat's full e2e suite, 46/48 native). 46 *.test.ts (~17.8k LOC), 7 integration tests, a full perf/ harness with committed baselines. Dead code to ignore: src/worker/index.ts and src/browser.ts carry explicit "NOT A GOOD REFERENCE / early guess" banners — the live paths are the embeds.


4. orez-web — the whole stack in the browser

The Zero dev stack running entirely in browser Web Workers, with no server: PGlite in a worker, the wire-protocol proxy in a worker, and real zero-cache compiled to run in a worker, with its replica on WASM SQLite. This is what powers SootBean's in-IDE dev environment.

Attribution: the orchestration, the worker mesh, the transport, the build-time surgery, and the browser shims live in soot's packages/orez-web/ (4,228 LOC). It imports orez's primitives — worker/browser-embed (startZeroCacheEmbedBrowser), pg-proxy-browser, change-tracking, worker/shims/*. The hard browser-specific engineering is soot-side.

Architecture

Each open project spawns a mesh of Web Workers (3 in the default singleDb mode), orchestrated from the main thread by startOrezWeb():

                 MAIN THREAD (packages/orez-web/src/index.ts: startOrezWeb)
                 spawns + wires the mesh; holds the OrezWeb handle
  ┌──────────────────────────────┼─────────────────────────────────────┐
  │ Zero client (preview iframe)  │  project-server worker (on-zero)     │
  │   fetch /api/zero/push|pull ──┤  validates + executes mutations      │
  │   WebSocket /sync ────────────┤  signalReplication()                 │
  └───────────────┬───────────────┴───────────────┬──────────────────────┘
        MessagePort│ (ws-connect)        SAB JSON   │ fast-path
                   ▼                                ▼
       ┌──────────────────────┐          ┌──────────────────────────┐
       │  zero-cache worker    │ connect  │  pg-proxy worker          │
       │  real @rocicorp/zero  │─ port ──▶│  pg-gateway wire proto    │
       │  SINGLE_PROCESS=1     │  (pg     │  + replication handler    │
       │  bedrock-sqlite (WASM │  socket) │  + mutex, schema cache    │
       │  replica, in-memory)  │          └─────────┬─────────────────┘
       └──────────┬───────────┘    SAB binary       │ (execProtocolRaw)
                  │ postgres-socket  + SAB JSON      ▼
                  │ (MessagePort)    ┌───────────────────────────────────────┐
                  └─────────────────▶│  PGlite worker(s) — postgres in WASM    │
                                     │  singleDb: 1 worker plays pg/cvr/cdb    │
                                     │  multi:    3 workers (pg / cvr / cdb)   │
                                     │  persisted: IndexedDB (idb://…)         │
                                     └───────────────────────────────────────┘

Cross-worker transport (two tiers):

  • MessagePort is the baseline (remaps message IDs into per-channel 1M-offset ranges so many channels can target one PGlite worker).
  • SharedArrayBuffer + Atomics.waitAsync is the hot path (sab-channel.ts): a 1MB ring, ~0.1ms vs MessagePort's ~40ms. Two flavors — binary SAB for execProtocolRaw wire bytes, JSON SAB for query/exec. Oversized payloads throw SabOverflowError and the caller falls back to MessagePort. SAB requires cross-origin isolation (COOP/COEP); there's a disableSab escape for browser-automation envs.

Persistence: PGlite data is IndexedDB (idb://…-postgres); the PGlite WASM module is compiled once on the main thread and transferred into the workers so three instances don't triple-fetch. zero-cache's replica SQLite (bedrock-sqlite) is kept in-memory only (/tmp/zero-replica.db) and rebuilt every page load — the durable truth is the IndexedDB-backed PGlite.

The genuinely novel / hard pieces

  1. Build-time surgery on zero's compiled output (packages/orez-web/scripts/build-zero-cache.ts) — the load-bearing, scariest piece. esbuild onLoad plugins rewrite zero's JS as it's bundled:

    • static-import the 5 worker modules to replace processes.js's dynamic import(moduleUrl) (no dynamic import in a worker bundle);
    • strip the if (!singleProcessMode()) exitAfter(...) auto-run guards;
    • replace ThreadWriteWorkerClient (which spawns a worker_thread for SQLite writes) with an inline class running ChangeProcessor directly;
    • inject an __orez_on_repl_commit hook;
    • a 6-edit view-syncer patch that defers query-pipeline sync until the replica catches up to the CVR stateVersion — because browser message ordering interleaves CVR churn and replica advancement more tightly than node's websocket path, tripping an upstream assertion.

    Every patch matches an exact source needle and throws "patch failed: needle not found" on drift. This is why Zero is pinned at 1.5.0.

  2. A Postgres wire-protocol proxy in the browser (pg-proxy-browser.ts): pg-gateway's web DuplexStream terminates real PG frontend/backend protocol over a MessagePort; the real postgres npm client talks to it through a net.Socket-compatible MessagePortSocket (postgres-socket.ts) with pause/resume buffering for COPY backpressure.

  3. The node-shim surface (worker/shims/): ~40 node:* builtins, postgres, @rocicorp/zero-sqlite3, fastify, ws — faked well enough that an unmodified multi-process Node server runs in one browser worker.

  4. The SAB ring + graceful fallback, and the singleDb collapse where one PGlite worker masquerades as pg/cvr/cdb (3 distinct JS proxy refs defeat reference-equality, so singleDb is threaded through to share one wire-protocol mutex).

Effort & difficulty — harder than node

A focused ~10-week effort starting 2026-03-20. Churn concentrated in pg-proxy-browser.ts (82 commits) and replication/handler.ts (65 commits) — faking logical replication and serializing wire-protocol access to single-threaded PGlite. Materially harder than the node path, which needs zero bundler aliases: the browser additionally required the whole shim surface, build-time JS rewriting of zero internals, a WASM SQLite with concurrent-write semantics, the SAB transport, IDB persistence/lock management, and the browser-only view-syncer patch.

Maintenance risk — high

  • Brittle string-anchored patches against zero internals (build-zero-cache.ts); the view-syncer patch is the scariest (6 edits of upstream sync logic — an upstream refactor silently invalidates correctness, not just compilation).
  • node-stub.ts tracks Node's surface; new zero code reaching an unshimmed node:* API fails opaquely.
  • bedrock-sqlite is a prebuilt opaque binary committed to public/ — no in-repo build script; regenerating the WAL2/BEGIN-CONCURRENT build is out-of-band tribal knowledge.
  • Cross-origin isolation is required for the SAB fast path; without COOP/COEP it silently degrades to 40ms MessagePort latency.
  • Three hand-maintained alias maps must stay in sync.

Completeness

Works end-to-end: full sync loop (optimistic + authoritative), pgoutput replication, CVR/CDC, view-syncer pokes to multiple clients (web iframe + native canvas simultaneously), IDB persistence across reloads, per-project isolation. Disabled by design: zero-cache admin dashboard + log store, query planner, litestream backup/restore; the replica is non-durable (rebuilt each load). The 4th proxy.worker.js (4 KB) is a legacy MessagePort proxy not used by soot — a cleanup candidate. Validation: test/orez-web-sync.test.ts (1,976 lines) drives the real browser stack through 11 named scenarios (first-add, multi-add, reload-persist, toggle, delete, mixed-flow, back-and-forth, …); test:orez:quick is on the CI validate-and-deploy gate. Shipped bundle sizes (uncompressed / brotli): zero-cache 6.15MB / 956KB, pglite 977KB / 171KB, pg-proxy 319KB / 92KB.


5. orez-cloudflare — real zero-cache in a Durable Object

The production deploy path for SootBean template apps (buildTrigger.ts hard-sets cloudflareDo: true for every prod deploy; it replaced the legacy fly-sprite tier). This is the hardest and riskiest of the three.

The constraint that shapes everything

Cloudflare Durable Objects cap at 128 MB RAM per instance. PGlite + WASM Postgres + extension binaries blows that budget — dead end. But zero-cache's value is the hard part (sync protocol, IVM/CVR, replication, client semantics). So the design keeps real zero-cache and replaces only the storage engine: zero-cache's replica goes straight to DO SQLite, and its upstream Postgres connection is satisfied by DoBackend, which speaks PG wire but executes against DO SQLite. No PGlite. No WASM Postgres. Ever.

Architecture (the real soot wiring)

                 Cloudflare Worker (deployed per project)   [orez-shim.js, generated by soot]
  ┌──────────────────────────────────────────────────────────────────────────────────┐
  │  isZeroCachePath? /sync/v*, /replication/v* ─────────────► ZERO_CACHE_DO singleton │
  │  isPublicSqlBackendPath? /exec /batch /changes ──────────► 404 (sealed)            │
  │  else (SSR + /api/* + /api/zero/push|pull) ──────────────► oneWorker.fetch          │
  │  static client bundle ───────────────────────────────────► ASSETS                  │
  └───────────────┬──────────────────────────────────────┬───────────────────────────┘
                  │ idFromName('singleton')                │ idFromName('singleton')
                  ▼                                        │
  ┌──────────────────────────────────────────┐            │
  │ ZERO_CACHE_DO  (soot ZeroCacheDO class)    │            │
  │  ensureReady():                            │            │
  │   1. runSootCloudflareMigrations()         │            │
  │      drizzle migrations → init.sql → publication        │
  │   2. resetReplicaIfTableSetChanged()       │            │
  │   3. startZeroCacheEmbedCF()  ── real zero-cache,       │
  │        SINGLE_PROCESS=1, in-process        │            │
  │   replica/CVR: ZERO_REPLICA_FILE=':do-sqlite:'          │
  │        → THIS DO's own ctx.storage.sql     │            │
  │   upstream PG conn: postgres-browser shim  │            │
  │        → DoBackend → PG wire → DO SQL HTTP ─┼────────────┼──┐
  │   app push/pull: fetch orez-zero-api.local ─┼──► oneWorker.fetch (same worker)
  └────────────────────────────────────────────┘            ▼
                                       ┌──────────────────────────────────────────────┐
                                       │ ZERO_SQL_DO  (orez ZeroDO from cf-do/worker.ts)│
                                       │  raw SQL endpoints (INTERNAL only):            │
                                       │   /exec /batch /commit-tracked-tx /rollback…   │
                                       │  ctx.storage.sql = the authoritative "Postgres"│
                                       │  _zero_changes + _zero_pending_changes + watermark
                                       └──────────────────────────────────────────────┘

Two distinct DO SQLite stores, both keyed 'singleton': the cache DO's storage holds zero-cache's derived replica/CVR/CDC; the SQL DO's storage holds the authoritative upstream rows. This split is exactly why wiping the replica is safe — the source of truth lives in the other DO.

Note: the package's cf-do/ARCHITECTURE.md diagram is slightly idealized. In the real soot deploy the cache DO is soot's ZeroCacheDO (it embeds zero-cache and owns the replica); only the SQL DO is orez's ZeroDO.

The seven hard pieces

  1. The 128MB constraint → "real zero-cache + DO SQLite, no PGlite." The bans are explicit and guarded (ARCHITECTURE.md:74-84).
  2. DoBackend — PG-wire → DO SQL (orez/pg-proxy-do-backend.ts, ~7.7k lines). Parses raw PG wire frames, rewrites each statement Postgres→SQLite via a full pgsql-parser AST pass (schema flattening, type/function mapping, catalog interception, array/json/timestamp coercion, RETURNING/CTE rewrites), and POSTs the rewritten SQL to the SQL DO's /exec or /batch. Because DO refuses raw SQL BEGIN/COMMIT/SAVEPOINT (Cloudflare requires ctx.storage.transaction()), PG-style multi-call transactions are emulated by snapshotting each table on first in-tx write and restoring atomically on rollback. Four boot-time amplification bugs were fixed 2026-05-26 (multi-row metadata persist instead of per-row HTTP; flush tx metadata once at commit; snapshot change-tables once per tx; skip sqlite_master probes when the schema is already known) — boot dropped from failing-at-60s to ~13s, all 51 chat e2e tests passing.
  3. The transaction barrier (do-sql-tracking.ts + cf-do/worker.ts). Prevents an add→remove→add flicker: without it, the server's transaction-tracked changes become visible before the SQL tx commits, so zero-cache sends an authoritative state missing the just-inserted row, the client briefly removes it, then re-adds. The fix: tracked writes inside a tx are staged in _zero_pending_changes keyed by a forwarded transactionID; on COMMIT, /commit-tracked-tx moves them into the live _zero_changes feed and bumps the watermark; on rollback, /rollback-tracked-tx deletes them. The todo validator catches regressions with a MutationObserver installed before the insert.
  4. Two split-brain self-healing fixes (both in soot's cloudflareDoDeploy.ts):
    • (a) init.sql vs drizzle-migrations drift → SchemaVersionNotSupported. In-browser codegen regenerates zero-schema.gen.ts + database/generated/init.sql from the same parse, but agents have no shell, so drizzle-kit generate never runs and database/migrations/** stays frozen. A browser-added table (e.g. run) is in the publication but never created → zero-cache rejects every client. Fix: cold-start runs drizzle migrations, then applies init.sql (idempotent CREATE TABLE IF NOT EXISTS), then ensurePublication publishes only tables that actually exist (so one drifted table degrades to "not replicated" instead of aborting all replication).
    • (b) the replica snapshots the publication once. zero-cache snapshots the publication into its replica once at initial sync; ALTER PUBLICATION ADD TABLE only feeds the change stream, not the existing snapshot. So a redeploy that adds a table leaves the persisted replica stuck on the old set → still SchemaVersionNotSupported. Fix: resetReplicaIfTableSetChanged stores a __soot_replica_schema_tag (sorted published table set) in DO storage and, when it changes (including the no-baseline case, recovering DOs created before this tracking), drops every non-sqlite_ table in the replica so zero-cache re-runs initial sync. The replica is derived data, so it's a re-sync, not data loss; on a new DO the replica is empty so the drop is a no-op. Makes "add a table, redeploy" self-healing with no manual DO reset.
  5. cf-patches.ts — the workerd-safe overlay. Copies zero-cache's compiled out/ into a generated overlay and symlinks its deps without mutating node_modules, applying five patches: file:// worker URLs → zero-worker://; strip CLI auto-start guards; replace dynamic import(moduleUrl) with a static lookup table (workerd forbids it); run the replica writer in-process (no worker_threads in workerd); base64-embed the libpg-query parser WASM inline (no fs/network wasm loading in workerd). Each patch warns loudly on version drift. The most fragile-to-upstream piece.
  6. The two-DO singleton model + routing. Both DOs are idFromName('singleton') so all sessions share one zero-cache process and one durable state. The raw SQL endpoints are sealed from public access (404) — they'd be an unauthenticated write surface — reachable only via internal DO stub fetch. App push/pull stays on oneWorker, reached by zero-cache through the orez-zero-api.local virtual host.
  7. The watermark (cf-do/watermark.ts). A monotonic change-feed cursor (the Zero "cookie") that reconciles three sources — a persisted last_value, MAX(watermark) from _zero_changes, and a sequence table — and takes the max, so consumed-then-purged changes never let a watermark be reused.

Effort & difficulty — hardest of the three

Extremely compressed. The browser-embed foundation landed Mar 20-21; the DO experiment went from "start on do experiment" (May 24) to "run real zero-cache on DO SQLite without PGlite" (May 31) — ~7 days. The ~/orez repo still holds 11 .orez-do-* scratch dirs all created within ~2 hours on May 24 (-test, -int-test, -int-test-2/3, -full, -zero, -txn-test, -final, -reaper, -debug) — concrete evidence of how much trial-and-error the DO bring-up took. The soot deploy wrapper saw 20 commits to cloudflareDoDeploy.ts across late May → early June (bundle alignment, node-builtin stubs, private endpoints, env isolation, flicker validation, init.sql bootstrap) — the orchestration churn is as heavy as the engine.

It is the only variant that (a) can't use PGlite at all → a bespoke ~7.7k-line PG-wire→SQLite translator, (b) runs zero-cache under workerd's hard constraints (no dynamic import, no worker_threads, no fs, no port binding — all patched/shimmed), (c) splits storage across two DOs with a manual transaction barrier, and (d) has two distinct split-brain failure modes each needing dedicated self-healing.

Maintenance risk — highest

  • workerd constraints are absolute and each is patched/stubbed (~30 virtual node-builtin stubs + 5 cf-patches); any can break on a workerd update.
  • The 128MB ceiling is permanent — large working sets are a latent risk.
  • The patch surface is brittle to upstream zero-cache (the Zero 1.5 upgrade already forced a worker.ts + do-backend pass).
  • The orez version pin + barrier guard: soot pins orez@0.3.5; the guard test reads the installed node_modules/orez/dist and asserts the barrier primitives (_zero_pending_changes, /commit-tracked-tx, /rollback-tracked-tx, transactionID) are present — a downgrade fails bun check.
  • The SchemaVersionNotSupported false-green trap: reloading a route that doesn't need a newly-added table false-greens after a redeploy. You must navigate to a route that needs the new table and poll ~20s to force the error. Self-confirmed reloads do not prove the replica reset worked.
  • Two singletons = a serialization point (per-/exec HTTP round-trips are serial through the SQL DO).

Completeness

This is the prod path, and it's runtime-validated, not liveness-checked. scripts/dev/validate-cf-do-runtime.ts boots 2–3 isolated browser contexts and proves: realtime A↔B sync, reload persistence, new-context persistence, zero page/console errors, the todo flicker detector (MutationObserver), and the Better-Auth demo-login path (app/flights) before mutating. Three validation layers — package (bun test test/cloudflare-do-deploy.test.ts, test-cf-do-bundle.ts), direct template deploy (test-cf-do-deploy.ts {todo,app,flights} --runtime), and prod UI deploy through sootbean.com (deploy-prod-project.ts --runtime). Closeout requires current evidence at every layer (see docs/cloudflare-do-deploy.md). The package's cf-do/worker.ts also carries a bespoke Zero sync handler — that's dev/protocol-experiment only and explicitly not the production sync; prod sync is served by real zero-cache, and the guard test asserts the shim never routes to the bespoke handler.


6. The in-IDE observability (a soot-side achievement)

Worth calling out because it's unusually nice: SootBean visualizes its own orez stack live, in the IDE.

  • ArchVizPane.tsx (706 lines): a live architecture visualization — pixel-art boxes for globe/phone/bolt, with dots flowing along the paths between components and labels flashing as events fire, driven by an archVizEvents bus. You watch a mutation travel client → on-zero → PGlite → trigger → replication → zero-cache → poke in real time.
  • OrezChangesPane.tsx: a live tail of the _orez.changes table (watermark, table, op, row/old data) — the replication change log as it's written.
  • ZeroDebugPane.tsx (302 lines): worker logs, status, a PGlite ping, and reset/restart actions for the in-browser zero-cache.

OrezZeroPane.tsx is a legacy stub kept only for layout compatibility (its job moved to ZeroDebugPane).


7. Comparative scorecard

Effort & difficulty (relative)

orez-node orez-web orez-cloudflare
Calendar time ~4 months (Feb–May 2026), the foundation ~10-week focused build (from Mar 20) ~2-week burst (DO core in ~7 days late May) + ongoing soot deploy churn
Hardest sub-problem faked logical replication; cross-process WASM SQLite SHM (never solved, worked around) build-time surgery on zero's compiled output; the view-syncer timing patch PG-wire→SQLite translation with no PGlite; two split-brain failure modes
Churn hotspots handler.ts (65), pg-proxy.ts (61) pg-proxy-browser.ts (82), handler.ts (65) 11 scratch dirs in 2h; cloudflareDoDeploy.ts (20)
Relative difficulty hard harder hardest

Maintenance risk

orez-node orez-web orez-cloudflare
Coupling to zero internals high (disk-rewrite patches, path imports) high (build-time needle patches, view-syncer logic) highest (5 overlay patches + ~30 stubs + workerd)
Runtime fragility WASM SQLite SHM (mitigated) COOP/COEP requirement; opaque bedrock binary 128MB ceiling; workerd updates
Breaks on a Zero bump? likely (caught loudly) likely (build fails loudly) likely (already happened on 1.5)
Guardrails litestream-anchor throw; perf baselines needle-not-found throws; 11-scenario test version-pin guard test; runtime validators + flicker detector

Completeness

orez-node orez-web orez-cloudflare
Intended use local dev in-IDE dev production
End-to-end working? yes (chat e2e, demos) yes (11 scenarios, dual-surface) yes (todo/app/flights runtime-validated)
Notable gaps dev-only by design (mutex, no HA); pg→sqlite compiler incomplete no durable replica; admin/planner disabled; legacy proxy.worker serialization point; large-working-set risk

8. Maintenance playbook — what to check on a Zero upgrade

A Zero version bump is the single most likely thing to break orez. In order:

  1. Bump @rocicorp/zero in ~/orez devDeps and run bun run check:all there. The litestream patch and sqlite3 shim throw loudly on signature drift — fix the anchors.
  2. orez-web: rebuild via bun scripts/build-orez.ts. The build-zero-cache.ts needle patches throw "patch failed: needle not found" if processes.js, the worker auto-run guards, write-worker-client.js, or view-syncer.js changed shape. Re-derive each needle. Then run test:orez (the 11-scenario browser suite).
  3. orez-cloudflare: the five cf-patches.ts patches warn (don't throw) on drift — read the logs. Run bun scripts/dev/test-cf-do-bundle.ts (bundles the overlay without deploying), then the template deploy validators with --runtime.
  4. Re-pin: publish a new orez, bump soot's package.json pin, and confirm the guard test (test/cloudflare-do-deploy.test.ts) still finds the barrier primitives in the installed dist. bun check includes the alignment check across root / templates / orez-sprites / bun.lock.
  5. Never trust a green reload for the CF path — force the SchemaVersionNotSupported path (navigate to a route needing a new table, poll ~20s) before declaring the deploy healthy.

9. File reference index

orez package (~/orez/src)

  • node: index.ts:258 (startZeroLite), pglite-manager.ts:265 (3 instances), pg-proxy.ts:152-201 (query rewrites), pglite-ipc.ts (ArrayBuffer-transfer IPC), recovery.ts, zero-litestream-patch.ts:77-96, s3-local.ts
  • replication (shared): replication/handler.ts:710-932 (event-driven loop), replication/pgoutput-encoder.ts, replication/change-tracker.ts, change-tracking.ts
  • embeds: worker/zero-cache-embed.ts (node), worker/browser-embed.ts (startZeroCacheEmbedBrowser), worker/zero-cache-embed-cf.ts:120 (startZeroCacheEmbedCF)
  • browser: pg-proxy-browser.ts, worker/shims/postgres-socket.ts, worker/shims/node-stub.ts, worker/browser-build-config.ts
  • cloudflare: pg-proxy-do-backend.ts:4870 (DoBackend), cf-do/worker.ts:132 (ZeroDO/ZeroSqlDO), cf-do/watermark.ts:21, do-sql-tracking.ts, worker/cf-patches.ts:61, cf-do/ARCHITECTURE.md, cf-do/CHAT_E2E.md
  • sqlite/compiler: sqlite-mode/apply-mode.ts, pg-sqlite-compiler/index.ts:38, sqlite-wasm/Makefile

soot integration (~/soot)

  • orez-web: packages/orez-web/src/index.ts:203 (startOrezWeb), workers/{zero-cache,pg-proxy,pglite}.worker.ts, workers/sab-channel.ts:29 (SAB ring), scripts/build-zero-cache.ts:55 (the esbuild surgery), scripts/build-workers.ts, shims/bedrock-sqlite-stub.ts, shims/postgres-browser.ts, lib/browser-mode.ts
  • cloudflare deploy: src/deploy/cloudflareDoDeploy.ts:73-319 shim source (ZeroCacheDO + routing + sealed endpoints), :208-265 ensureReady, :266-294 resetReplicaIfTableSetChanged, :485-658 migrations/ensurePublication/applyInitSqlDDL, :982-1025 bundler, :1175 deployToCloudflareDo; src/deploy/buildUserApp.ts:256-268; src/deploy/buildTrigger.ts (cloudflareDo: true)
  • per-project wiring: src/worker/projectRuntime.ts:822 (startOrezWeb call), src/worker/orez-web-client.ts, src/worker/project-server.worker.ts, src/worker/seed-project-data.ts
  • build & validators: scripts/build-orez.ts, scripts/dev/validate-cf-do-runtime.ts, scripts/dev/test-cf-do-deploy.ts, scripts/dev/deploy-prod-project.ts, scripts/dev/test-cf-do-bundle.ts, test/cloudflare-do-deploy.test.ts:197, test/orez-web-sync.test.ts
  • observability & prod: src/tiling/components/{ArchVizPane,OrezChangesPane,ZeroDebugPane}.tsx, src/production/{deployTier,orezLogUrl,HealthPane}.ts
  • legacy tier: packages/orez-sprites/src/{provision,destroy,status}.ts (fly sprites — replaced by CF DO)
  • canonical flows: docs/architecture.md:40-176 (the 18-step in-browser sync flow + worker table)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment