A technical deep-dive into the Zero/Postgres backend stack — orez-node,
orez-web, and orez-cloudflare — and how the work is split between the
standalone orez package and this repo (soot).
Related docs:
docs/architecture.md(the canonical in-browser sync flow),docs/cloudflare-do-deploy.md(the CF DO deploy operational reference),docs/zero.md,docs/staging-cf-do.md. This doc is the why and the how it was built; those are the how to operate it.
Zero (Rocicorp's sync engine) wants two things from its environment that are normally heavyweight native dependencies:
- a real Postgres with logical replication (the upstream / source of truth), and
- a native better-sqlite3 addon for its local replica + CVR + CDC stores.
orez's entire thesis is: give zero-cache neither, but impersonate both so
convincingly that an unmodified @rocicorp/zero runs on top. We never fork
zero-cache. We fake the Postgres wire protocol, fake logical replication, and
swap the storage engine underneath it. That one trick is then re-targeted at
three radically different runtimes:
| Runtime | Where it runs | Postgres engine | SQLite (replica) engine | Status |
|---|---|---|---|---|
| orez-node | a Node/Bun child process (bunx orez) |
PGlite (Postgres-in-WASM) | bedrock-sqlite (WASM) or native | local dev |
| orez-web | browser Web Workers, no server | PGlite in a worker | bedrock-sqlite (WASM, in-memory) | SootBean in-IDE dev |
| orez-cloudflare | a Cloudflare Durable Object | DO SQLite via a PG→SQLite translator (no PGlite) | the DO's own ctx.storage.sql |
production deploys |
The three share a foundation (wire-protocol proxy, faked logical replication, change-tracking triggers, pgoutput encoder) and diverge sharply in how they host zero-cache and where bytes land.
Difficulty / risk at a glance (detailed scorecard in §7):
effort difficulty maint. risk completeness
orez-node ~4 months hard medium-high dev-grade, battle-tested
orez-web ~10 weeks harder high works e2e, no durable replica
orez-cloudflare ~2 wk burst hardest highest the prod path, runtime-validated
This is the part that's easy to get wrong. The orez npm package
(~/orez, 545 commits, 2026-02-08 → 2026-05-31, currently orez@0.3.5)
ships the reusable primitives. soot (~/soot) ships the
productized integration — and a large share of the genuinely hard,
interesting engineering for web and cloudflare actually lives here, not in
the package.
┌─────────────────────────────── orez package (~/orez) ──────────────────────────────┐
│ PRIMITIVES — reusable, runtime-agnostic │
│ │
│ pg-proxy.ts ................. Postgres wire-protocol proxy (the "fake Postgres") │
│ replication/ ................ faked logical replication + pgoutput binary encoder │
│ change-tracking.ts .......... AFTER-ROW triggers → _orez.changes change log │
│ pglite-manager.ts ........... 3-instance PGlite lifecycle, memory tuning, recovery │
│ sqlite-mode/ ................ swap @rocicorp/zero-sqlite3 → bedrock-sqlite (WASM) │
│ pg-sqlite-compiler/ ......... pure-TS PostgreSQL→SQLite SQL translator │
│ worker/zero-cache-embed.ts .. run zero-cache in-process (node) │
│ worker/browser-embed.ts ..... run zero-cache in-process (browser worker) │
│ worker/zero-cache-embed-cf.ts run zero-cache in-process (Durable Object) │
│ worker/cf-patches.ts ........ workerd-safe zero-cache overlay (no node_modules edit)│
│ worker/shims/ ............... node:* builtins, postgres-socket, ws-browser, … │
│ pg-proxy-browser.ts ......... wire-protocol proxy over a MessagePort │
│ pg-proxy-do-backend.ts ...... DoBackend: PG wire → DO SQL (the ~7.7k-line translator)│
│ cf-do/worker.ts ............. ZeroDO / ZeroSqlDO — the DO SQL backend │
│ cf-do/watermark.ts .......... monotonic change-feed cursor over DO SQLite │
│ do-sql-tracking.ts .......... transaction barrier (_zero_pending_changes) │
│ s3-local.ts, recovery.ts, zero-litestream-patch.ts │
└──────────────────────────────────────────────────────────────────────────────────┘
▲ imported by ▲ ▲ imported by ▲
┌──────────────────────── soot (~/soot) — INTEGRATION + PRODUCTIZATION ───────────────┐
│ │
│ packages/orez-web/ (4,228 LOC) — the in-browser orchestrator │
│ src/index.ts ............. startOrezWeb(): spawns + wires the worker mesh │
│ src/workers/*.worker.ts .. 4 worker entrypoints (zc / pg-proxy / pglite / proxy) │
│ src/workers/sab-channel.ts SharedArrayBuffer ring transport (the fast path) │
│ scripts/build-zero-cache.ts build-time esbuild SURGERY on zero's compiled output │
│ src/shims/bedrock-sqlite-stub.ts · postgres-browser.ts · lib/browser-mode.ts │
│ │
│ src/deploy/cloudflareDoDeploy.ts (1,398 LOC) — the entire CF DO deploy path │
│ ZeroCacheDO class · routing shim source · the build/bundler · migration runner │
│ ensurePublication · resetReplicaIfTableSetChanged · the two split-brain fixes │
│ │
│ src/worker/projectRuntime.ts (34 KB) · orez-web-client.ts · project-server.worker.ts│
│ scripts/build-orez.ts ........ build → public/orez-web-*.worker.js │
│ scripts/dev/{validate,test,monitor}-cf-do-*.ts — the runtime validators │
│ src/tiling/components/{ArchViz,OrezChanges,ZeroDebug}Pane.tsx — live in-IDE viz │
│ src/production/{deployTier,orezLogUrl,HealthPane}.ts — prod monitoring │
│ packages/orez-sprites/ ....... LEGACY fly-sprite deploy tier (replaced by CF DO) │
└──────────────────────────────────────────────────────────────────────────────────┘
Rule of thumb: the package knows how to be a fake Postgres + run zero-cache off-Node; soot knows how to deploy and operate that for real projects — the worker mesh per project, the CF deploy pipeline, the self-healing migrations, the validators, and the visualizations.
Every runtime reuses these four primitives from the orez package. Understand
them once and all three variants make sense.
pg-proxy.ts (node) and pg-proxy-browser.ts (browser) are a partial,
from-scratch Postgres server whose only job is to lie convincingly to one
specific client: zero-cache. They hand-parse and rebuild raw PG protocol bytes
and rewrite the queries zero-cache issues that PGlite can't satisfy:
version()→ a cleanPostgreSQL 17.4string (emscripten's breakspg_restore)current_setting('wal_level')→'logical'- strip
READ ONLY/ISOLATION LEVEL/SET TRANSACTION(meaningless on a single session) pg_replication_slots→ a fake_oreztable;pg_drop_replication_slot()/pg_terminate_backend()→ harmless SQL- synthesize responses for ping queries (
SELECT 1) and no-opSETs without touching the mutex - dedupe identical
information_schema/catalog queries from multiple sync workers (30s TTL + in-flight coalescing)
A per-instance async mutex serializes everything else, because PGlite is
single-session (pg-proxy.ts:152-201, 313-340, 887-960).
PGlite has no logical replication, so orez invents the entire pgoutput
pipeline:
- change-tracking triggers (
change-tracking.ts):AFTER INSERT/UPDATE/DELETEtriggers capture every mutation as JSONB into the_orez.changeschange log. - a pgoutput binary encoder (
replication/pgoutput-encoder.ts): emitsBEGIN/RELATION/INSERT/UPDATE/DELETE/COMMIT/keepalive frames in the exact wire framing zero-cache's change-streamer expects. - a replication handler (
replication/handler.ts): fakesIDENTIFY_SYSTEM/CREATE_REPLICATION_SLOT/START_REPLICATION, synthesizes LSNs from wall-clock time (so a restarted proxy never emits an LSN behind zero-cache's persisted watermark), honors the client's resume LSN on reconnect, and streams the encoded changes.
This is the single highest-churn area of the whole project (handler.ts: 65 commits) — every reconnect/restart edge case was a separate hard-won fix.
Doc correction: the README still claims "20ms/500ms adaptive polling." The handler is now event-driven (
signalReplicationChangewakeup + apg_notifyLISTEN fallback, 5000ms idle keepalive, ~30s safety poll). Commit9d2b182dropped idle CPU from 75-100% to near-zero.
Zero normally fork()s 5 worker processes (main / change-streamer / reaper /
replicator / syncer). orez forces SINGLE_PROCESS=1 and feeds zero's
runWorker() a fake parent: a node:events EventEmitter wrapped in a
Proxy that adds zero's onMessageType/onceMessageType IPC surface, with
process.exit intercepted into an emit. Zero's inter-process IPC collapses
into in-process EventEmitter channels. The three embeds
(zero-cache-embed.ts, worker/browser-embed.ts, worker/zero-cache-embed-cf.ts)
are the same idea targeted at Node, a Web Worker, and a Durable Object.
zero-cache hard-requires @rocicorp/zero-sqlite3 (a native C addon). On WASM
targets, orez points it at bedrock-sqlite — SQLite's bedrock branch
compiled to WASM with WAL2 + BEGIN CONCURRENT (which zero-cache requires)
— and polyfills the better-sqlite3 surface. In node this is an on-disk rewrite
of the package's index.js (sqlite-mode/apply-mode.ts); in the browser/DO
it's a build-time alias. This is the riskiest dependency-coupling surface in
the package.
bunx orez: run Zero locally with no native deps — no Postgres install, no
SQLite compile, no Docker. This is the original target and the proving ground
for everything above. All of it lives in the orez package.
bunx orez (cli-entry.ts → cli.ts → index.ts:startZeroLite)
│
┌────┴────────────┐ ┌──────────────────────────────┐ ┌────────────────────────┐
│ PGlite ×3 │ │ pg-proxy.ts (TCP :6434) │ │ zero-cache CHILD PROCESS │
│ postgres ◄─────┼─────┤ speaks Postgres WIRE PROTOCOL │◄───┤ real @rocicorp/zero │
│ zero_cvr ◄─────┼─────┤ pg-gateway for normal conns │ PG │ (spawned node) │
│ zero_cdb ◄─────┼─────┤ raw socket for replication │wire│ ZERO_UPSTREAM_DB=:6434 │
│ (3 data dirs, │ │ per-instance Mutex │ │ ZERO_REPLICA_FILE=*.db │
│ 3 mutexes) │ │ query rewrites + schema cache │ └───────────┬─────────────┘
└────┬────────────┘ └──────────────┬─────────────────┘ │ reads/writes
│ change triggers │ START_REPLICATION ▼ replica via
▼ ▼ ┌──────────────────────────┐
┌─────────────────┐ ┌──────────────────────────────┐ │ bedrock-sqlite (WASM) or │
│ _orez.changes │──►│ replication/handler.ts │ │ native @rocicorp/zero- │
│ + pg_notify │ │ fakes IDENTIFY/SLOT/START_REPL│ │ sqlite3 (runtime swap) │
└─────────────────┘ │ encodes pgoutput, streams it │ └──────────────────────────┘
└──────────────────────────────┘
Why three PGlite instances: zero-cache wants three databases with
independent transaction contexts. PGlite is single-session; sharing one
corrupts CVR transactions with ConcurrentModificationException. So orez runs
three separate PGlite data dirs, each with its own mutex; the main one gets
dev-tuned memory + 12 lazy-loaded contrib extensions (pgvector, pg_trgm, …),
cvr/cdb get minimal fixed memory (pglite-manager.ts:118-167, 265-278).
Two IPC layers: zero-cache is a real child process over a TCP socket +
PG wire (with a --require preload that polls the parent PID and self-destructs
on parent death — macOS has no PR_SET_PDEATHSIG, and orphaned sync-workers
busy-loop at 100% CPU). PGlite optionally runs in worker threads so
synchronous WASM execution doesn't block the proxy/replication event loop;
wire-protocol ArrayBuffers are transferred, not copied (pglite-ipc.ts).
- The wire-protocol proxy (§2.1) — a from-scratch partial Postgres server.
- Faked logical replication (§2.2) — the pgoutput pipeline.
- bedrock-sqlite + the on-disk shim swap (§2.4), with backup/restore lifecycle to prevent WASM↔native mode contamination.
- The pg→sqlite compiler (
pg-sqlite-compiler/, ~2.1k LOC): a pure-TS PostgreSQL→SQLite translator (single-pass visitor over the libpg_query WASM AST), built so the translation can later be bundled into a Durable Object (pgsqlite is Rust/tokio and won't run in workerd). Oracle-tested against the real pgsqlite binary (910 fixtures) with a corpus ratchet. Promising but incomplete — dev-oracle-gated, not load-bearing for the node path yet. - Memory auto-sizing & WAL purge: re-execs node at ~50% RAM, periodic
CHECKPOINTto boundpg_wal/growth, consumed change-log rows purged after streaming. - The litestream patch (
zero-litestream-patch.ts): Zero 1.5's dedicated change-streamer unconditionally callsrestoreReplica()expecting a litestream backup orez doesn't have. orez patches the compiledcommands.json disk to short-circuit it — and throws loudly if the upstream function signature changes. The cleanest example of the "patch zero internals surgically, fail loudly on drift" pattern. s3-local.ts: a deliberately minimal GET/PUT/DELETE/HEAD file server with CORS so Zero apps needing uploads don't pull in MinIO/Docker.- Crash recovery: CDC-corruption detection drives
recoverZeroState()which wipes+rebuilds CVR/CDB + replica as one consistency domain, with a budget of 5 resets per 5 minutes.
Built over ~4 months (Feb–May 2026), heavily debug-driven (142 fix: vs 57
feat: vs 23 perf:). Churn concentration (a proxy for difficulty):
| Subsystem | Evidence | Difficulty |
|---|---|---|
| Replication (handler/encoder/tracker) | handler.ts 65 commits; "replication" 54, "watermark" 17 | hardest — every reconnect/restart edge is a fix |
| WASM SQLite / SHM / WAL2 | "sqlite" 50, "wasm" 33; two dedicated plans documenting a multi-week dead-end | very hard, partially unresolved — cross-process WASM SHM was worked around (force 1 sync worker + native fallback), not solved |
| pg-proxy / wire protocol | pg-proxy.ts 61 commits; "lock|mutex" 43 | hard — single-session semantics on shared instances is a minefield |
| restore / recovery | "restore" 27 (COPY→INSERT, dollar-quoting, oversized-row skip) | medium-hard |
Overall: high. Not a thin adapter — a reimplementation of the Postgres-facing contract zero-cache depends on, each piece bug-found through real app integration.
The dominant risk is coupling to zero-cache internals: orez imports
zero-cache's run-worker by a string-munged filesystem path, rewrites compiled
zero files on disk (litestream patch, the sqlite3 shim), and matches CDC error
strings by substring in recovery. Pinned at @rocicorp/zero@1.5.0 for a
reason. On a Zero upgrade, expect to revisit: the run-worker path, the IPC
contract, the litestream anchor (caught loudly), the SQLite pragma surface, and
the replica/CVR/CDB schema framing.
Explicitly dev-only (README: "Not suitable for production" — single-session
mutex, trigger overhead, no HA). Within that scope, the proxy, triggers,
pgoutput encoder, restore, and crash recovery are production-quality and
battle-tested against real apps (chat's full e2e suite, 46/48 native). 46
*.test.ts (~17.8k LOC), 7 integration tests, a full perf/ harness with
committed baselines. Dead code to ignore: src/worker/index.ts and
src/browser.ts carry explicit "NOT A GOOD REFERENCE / early guess" banners —
the live paths are the embeds.
The Zero dev stack running entirely in browser Web Workers, with no server: PGlite in a worker, the wire-protocol proxy in a worker, and real zero-cache compiled to run in a worker, with its replica on WASM SQLite. This is what powers SootBean's in-IDE dev environment.
Attribution: the orchestration, the worker mesh, the transport, the
build-time surgery, and the browser shims live in soot's packages/orez-web/
(4,228 LOC). It imports orez's primitives — worker/browser-embed
(startZeroCacheEmbedBrowser), pg-proxy-browser, change-tracking,
worker/shims/*. The hard browser-specific engineering is soot-side.
Each open project spawns a mesh of Web Workers (3 in the default singleDb
mode), orchestrated from the main thread by startOrezWeb():
MAIN THREAD (packages/orez-web/src/index.ts: startOrezWeb)
spawns + wires the mesh; holds the OrezWeb handle
┌──────────────────────────────┼─────────────────────────────────────┐
│ Zero client (preview iframe) │ project-server worker (on-zero) │
│ fetch /api/zero/push|pull ──┤ validates + executes mutations │
│ WebSocket /sync ────────────┤ signalReplication() │
└───────────────┬───────────────┴───────────────┬──────────────────────┘
MessagePort│ (ws-connect) SAB JSON │ fast-path
▼ ▼
┌──────────────────────┐ ┌──────────────────────────┐
│ zero-cache worker │ connect │ pg-proxy worker │
│ real @rocicorp/zero │─ port ──▶│ pg-gateway wire proto │
│ SINGLE_PROCESS=1 │ (pg │ + replication handler │
│ bedrock-sqlite (WASM │ socket) │ + mutex, schema cache │
│ replica, in-memory) │ └─────────┬─────────────────┘
└──────────┬───────────┘ SAB binary │ (execProtocolRaw)
│ postgres-socket + SAB JSON ▼
│ (MessagePort) ┌───────────────────────────────────────┐
└─────────────────▶│ PGlite worker(s) — postgres in WASM │
│ singleDb: 1 worker plays pg/cvr/cdb │
│ multi: 3 workers (pg / cvr / cdb) │
│ persisted: IndexedDB (idb://…) │
└───────────────────────────────────────┘
Cross-worker transport (two tiers):
- MessagePort is the baseline (remaps message IDs into per-channel 1M-offset ranges so many channels can target one PGlite worker).
- SharedArrayBuffer +
Atomics.waitAsyncis the hot path (sab-channel.ts): a 1MB ring, ~0.1ms vs MessagePort's ~40ms. Two flavors — binary SAB forexecProtocolRawwire bytes, JSON SAB forquery/exec. Oversized payloads throwSabOverflowErrorand the caller falls back to MessagePort. SAB requires cross-origin isolation (COOP/COEP); there's adisableSabescape for browser-automation envs.
Persistence: PGlite data is IndexedDB (idb://…-postgres); the PGlite
WASM module is compiled once on the main thread and transferred into the workers
so three instances don't triple-fetch. zero-cache's replica SQLite
(bedrock-sqlite) is kept in-memory only (/tmp/zero-replica.db) and rebuilt
every page load — the durable truth is the IndexedDB-backed PGlite.
-
Build-time surgery on zero's compiled output (
packages/orez-web/scripts/build-zero-cache.ts) — the load-bearing, scariest piece. esbuildonLoadplugins rewrite zero's JS as it's bundled:- static-import the 5 worker modules to replace
processes.js's dynamicimport(moduleUrl)(no dynamic import in a worker bundle); - strip the
if (!singleProcessMode()) exitAfter(...)auto-run guards; - replace
ThreadWriteWorkerClient(which spawns aworker_threadfor SQLite writes) with an inline class runningChangeProcessordirectly; - inject an
__orez_on_repl_commithook; - a 6-edit view-syncer patch that defers query-pipeline sync until the replica catches up to the CVR stateVersion — because browser message ordering interleaves CVR churn and replica advancement more tightly than node's websocket path, tripping an upstream assertion.
Every patch matches an exact source needle and throws "patch failed: needle not found" on drift. This is why Zero is pinned at
1.5.0. - static-import the 5 worker modules to replace
-
A Postgres wire-protocol proxy in the browser (
pg-proxy-browser.ts):pg-gateway's webDuplexStreamterminates real PG frontend/backend protocol over aMessagePort; the realpostgresnpm client talks to it through anet.Socket-compatibleMessagePortSocket(postgres-socket.ts) with pause/resume buffering for COPY backpressure. -
The node-shim surface (
worker/shims/): ~40node:*builtins,postgres,@rocicorp/zero-sqlite3,fastify,ws— faked well enough that an unmodified multi-process Node server runs in one browser worker. -
The SAB ring + graceful fallback, and the
singleDbcollapse where one PGlite worker masquerades as pg/cvr/cdb (3 distinct JS proxy refs defeat reference-equality, sosingleDbis threaded through to share one wire-protocol mutex).
A focused ~10-week effort starting 2026-03-20. Churn concentrated in
pg-proxy-browser.ts (82 commits) and replication/handler.ts (65
commits) — faking logical replication and serializing wire-protocol access to
single-threaded PGlite. Materially harder than the node path, which needs zero
bundler aliases: the browser additionally required the whole shim surface,
build-time JS rewriting of zero internals, a WASM SQLite with concurrent-write
semantics, the SAB transport, IDB persistence/lock management, and the
browser-only view-syncer patch.
- Brittle string-anchored patches against zero internals
(
build-zero-cache.ts); the view-syncer patch is the scariest (6 edits of upstream sync logic — an upstream refactor silently invalidates correctness, not just compilation). node-stub.tstracks Node's surface; new zero code reaching an unshimmednode:*API fails opaquely.- bedrock-sqlite is a prebuilt opaque binary committed to
public/— no in-repo build script; regenerating the WAL2/BEGIN-CONCURRENT build is out-of-band tribal knowledge. - Cross-origin isolation is required for the SAB fast path; without COOP/COEP it silently degrades to 40ms MessagePort latency.
- Three hand-maintained alias maps must stay in sync.
Works end-to-end: full sync loop (optimistic + authoritative), pgoutput
replication, CVR/CDC, view-syncer pokes to multiple clients (web iframe + native
canvas simultaneously), IDB persistence across reloads, per-project isolation.
Disabled by design: zero-cache admin dashboard + log store, query planner,
litestream backup/restore; the replica is non-durable (rebuilt each load). The
4th proxy.worker.js (4 KB) is a legacy MessagePort proxy not used by soot — a
cleanup candidate. Validation: test/orez-web-sync.test.ts (1,976 lines)
drives the real browser stack through 11 named scenarios (first-add, multi-add,
reload-persist, toggle, delete, mixed-flow, back-and-forth, …);
test:orez:quick is on the CI validate-and-deploy gate. Shipped bundle sizes
(uncompressed / brotli): zero-cache 6.15MB / 956KB, pglite 977KB / 171KB,
pg-proxy 319KB / 92KB.
The production deploy path for SootBean template apps
(buildTrigger.ts hard-sets cloudflareDo: true for every prod deploy; it
replaced the legacy fly-sprite tier). This is the hardest and riskiest of the
three.
Cloudflare Durable Objects cap at 128 MB RAM per instance. PGlite + WASM
Postgres + extension binaries blows that budget — dead end. But zero-cache's
value is the hard part (sync protocol, IVM/CVR, replication, client
semantics). So the design keeps real zero-cache and replaces only the
storage engine: zero-cache's replica goes straight to DO SQLite, and its
upstream Postgres connection is satisfied by DoBackend, which speaks PG wire
but executes against DO SQLite. No PGlite. No WASM Postgres. Ever.
Cloudflare Worker (deployed per project) [orez-shim.js, generated by soot]
┌──────────────────────────────────────────────────────────────────────────────────┐
│ isZeroCachePath? /sync/v*, /replication/v* ─────────────► ZERO_CACHE_DO singleton │
│ isPublicSqlBackendPath? /exec /batch /changes ──────────► 404 (sealed) │
│ else (SSR + /api/* + /api/zero/push|pull) ──────────────► oneWorker.fetch │
│ static client bundle ───────────────────────────────────► ASSETS │
└───────────────┬──────────────────────────────────────┬───────────────────────────┘
│ idFromName('singleton') │ idFromName('singleton')
▼ │
┌──────────────────────────────────────────┐ │
│ ZERO_CACHE_DO (soot ZeroCacheDO class) │ │
│ ensureReady(): │ │
│ 1. runSootCloudflareMigrations() │ │
│ drizzle migrations → init.sql → publication │
│ 2. resetReplicaIfTableSetChanged() │ │
│ 3. startZeroCacheEmbedCF() ── real zero-cache, │
│ SINGLE_PROCESS=1, in-process │ │
│ replica/CVR: ZERO_REPLICA_FILE=':do-sqlite:' │
│ → THIS DO's own ctx.storage.sql │ │
│ upstream PG conn: postgres-browser shim │ │
│ → DoBackend → PG wire → DO SQL HTTP ─┼────────────┼──┐
│ app push/pull: fetch orez-zero-api.local ─┼──► oneWorker.fetch (same worker)
└────────────────────────────────────────────┘ ▼
┌──────────────────────────────────────────────┐
│ ZERO_SQL_DO (orez ZeroDO from cf-do/worker.ts)│
│ raw SQL endpoints (INTERNAL only): │
│ /exec /batch /commit-tracked-tx /rollback… │
│ ctx.storage.sql = the authoritative "Postgres"│
│ _zero_changes + _zero_pending_changes + watermark
└──────────────────────────────────────────────┘
Two distinct DO SQLite stores, both keyed 'singleton': the cache DO's
storage holds zero-cache's derived replica/CVR/CDC; the SQL DO's storage holds
the authoritative upstream rows. This split is exactly why wiping the replica is
safe — the source of truth lives in the other DO.
Note: the package's
cf-do/ARCHITECTURE.mddiagram is slightly idealized. In the real soot deploy the cache DO is soot'sZeroCacheDO(it embeds zero-cache and owns the replica); only the SQL DO is orez'sZeroDO.
- The 128MB constraint → "real zero-cache + DO SQLite, no PGlite." The bans
are explicit and guarded (
ARCHITECTURE.md:74-84). DoBackend— PG-wire → DO SQL (orez/pg-proxy-do-backend.ts, ~7.7k lines). Parses raw PG wire frames, rewrites each statement Postgres→SQLite via a fullpgsql-parserAST pass (schema flattening, type/function mapping, catalog interception, array/json/timestamp coercion, RETURNING/CTE rewrites), and POSTs the rewritten SQL to the SQL DO's/execor/batch. Because DO refuses raw SQLBEGIN/COMMIT/SAVEPOINT(Cloudflare requiresctx.storage.transaction()), PG-style multi-call transactions are emulated by snapshotting each table on first in-tx write and restoring atomically on rollback. Four boot-time amplification bugs were fixed 2026-05-26 (multi-row metadata persist instead of per-row HTTP; flush tx metadata once at commit; snapshot change-tables once per tx; skipsqlite_masterprobes when the schema is already known) — boot dropped from failing-at-60s to ~13s, all 51 chat e2e tests passing.- The transaction barrier (
do-sql-tracking.ts+cf-do/worker.ts). Prevents an add→remove→add flicker: without it, the server's transaction-tracked changes become visible before the SQL tx commits, so zero-cache sends an authoritative state missing the just-inserted row, the client briefly removes it, then re-adds. The fix: tracked writes inside a tx are staged in_zero_pending_changeskeyed by a forwardedtransactionID; onCOMMIT,/commit-tracked-txmoves them into the live_zero_changesfeed and bumps the watermark; on rollback,/rollback-tracked-txdeletes them. The todo validator catches regressions with aMutationObserverinstalled before the insert. - Two split-brain self-healing fixes (both in soot's
cloudflareDoDeploy.ts):- (a) init.sql vs drizzle-migrations drift →
SchemaVersionNotSupported. In-browser codegen regenerateszero-schema.gen.ts+database/generated/init.sqlfrom the same parse, but agents have no shell, sodrizzle-kit generatenever runs anddatabase/migrations/**stays frozen. A browser-added table (e.g.run) is in the publication but never created → zero-cache rejects every client. Fix: cold-start runs drizzle migrations, then appliesinit.sql(idempotentCREATE TABLE IF NOT EXISTS), thenensurePublicationpublishes only tables that actually exist (so one drifted table degrades to "not replicated" instead of aborting all replication). - (b) the replica snapshots the publication once. zero-cache snapshots
the publication into its replica once at initial sync;
ALTER PUBLICATION ADD TABLEonly feeds the change stream, not the existing snapshot. So a redeploy that adds a table leaves the persisted replica stuck on the old set → stillSchemaVersionNotSupported. Fix:resetReplicaIfTableSetChangedstores a__soot_replica_schema_tag(sorted published table set) in DO storage and, when it changes (including the no-baseline case, recovering DOs created before this tracking), drops every non-sqlite_table in the replica so zero-cache re-runs initial sync. The replica is derived data, so it's a re-sync, not data loss; on a new DO the replica is empty so the drop is a no-op. Makes "add a table, redeploy" self-healing with no manual DO reset.
- (a) init.sql vs drizzle-migrations drift →
cf-patches.ts— the workerd-safe overlay. Copies zero-cache's compiledout/into a generated overlay and symlinks its deps without mutating node_modules, applying five patches:file://worker URLs →zero-worker://; strip CLI auto-start guards; replace dynamicimport(moduleUrl)with a static lookup table (workerd forbids it); run the replica writer in-process (noworker_threadsin workerd); base64-embed the libpg-query parser WASM inline (no fs/network wasm loading in workerd). Each patch warns loudly on version drift. The most fragile-to-upstream piece.- The two-DO singleton model + routing. Both DOs are
idFromName('singleton')so all sessions share one zero-cache process and one durable state. The raw SQL endpoints are sealed from public access (404) — they'd be an unauthenticated write surface — reachable only via internal DO stubfetch. App push/pull stays ononeWorker, reached by zero-cache through theorez-zero-api.localvirtual host. - The watermark (
cf-do/watermark.ts). A monotonic change-feed cursor (the Zero "cookie") that reconciles three sources — a persistedlast_value,MAX(watermark)from_zero_changes, and a sequence table — and takes the max, so consumed-then-purged changes never let a watermark be reused.
Extremely compressed. The browser-embed foundation landed Mar 20-21; the DO
experiment went from "start on do experiment" (May 24) to "run real zero-cache
on DO SQLite without PGlite" (May 31) — ~7 days. The ~/orez repo still
holds 11 .orez-do-* scratch dirs all created within ~2 hours on May 24
(-test, -int-test, -int-test-2/3, -full, -zero, -txn-test, -final,
-reaper, -debug) — concrete evidence of how much trial-and-error the DO
bring-up took. The soot deploy wrapper saw 20 commits to cloudflareDoDeploy.ts
across late May → early June (bundle alignment, node-builtin stubs, private
endpoints, env isolation, flicker validation, init.sql bootstrap) — the
orchestration churn is as heavy as the engine.
It is the only variant that (a) can't use PGlite at all → a bespoke ~7.7k-line PG-wire→SQLite translator, (b) runs zero-cache under workerd's hard constraints (no dynamic import, no worker_threads, no fs, no port binding — all patched/shimmed), (c) splits storage across two DOs with a manual transaction barrier, and (d) has two distinct split-brain failure modes each needing dedicated self-healing.
- workerd constraints are absolute and each is patched/stubbed (~30 virtual node-builtin stubs + 5 cf-patches); any can break on a workerd update.
- The 128MB ceiling is permanent — large working sets are a latent risk.
- The patch surface is brittle to upstream zero-cache (the Zero 1.5 upgrade already forced a worker.ts + do-backend pass).
- The orez version pin + barrier guard: soot pins
orez@0.3.5; the guard test reads the installednode_modules/orez/distand asserts the barrier primitives (_zero_pending_changes,/commit-tracked-tx,/rollback-tracked-tx,transactionID) are present — a downgrade failsbun check. - The
SchemaVersionNotSupportedfalse-green trap: reloading a route that doesn't need a newly-added table false-greens after a redeploy. You must navigate to a route that needs the new table and poll ~20s to force the error. Self-confirmed reloads do not prove the replica reset worked. - Two singletons = a serialization point (per-
/execHTTP round-trips are serial through the SQL DO).
This is the prod path, and it's runtime-validated, not
liveness-checked. scripts/dev/validate-cf-do-runtime.ts boots 2–3 isolated
browser contexts and proves: realtime A↔B sync, reload persistence, new-context
persistence, zero page/console errors, the todo flicker detector
(MutationObserver), and the Better-Auth demo-login path (app/flights) before
mutating. Three validation layers — package
(bun test test/cloudflare-do-deploy.test.ts, test-cf-do-bundle.ts), direct
template deploy (test-cf-do-deploy.ts {todo,app,flights} --runtime), and prod
UI deploy through sootbean.com (deploy-prod-project.ts --runtime). Closeout
requires current evidence at every layer (see
docs/cloudflare-do-deploy.md). The package's
cf-do/worker.ts also carries a bespoke Zero sync handler — that's
dev/protocol-experiment only and explicitly not the production sync; prod
sync is served by real zero-cache, and the guard test asserts the shim never
routes to the bespoke handler.
Worth calling out because it's unusually nice: SootBean visualizes its own orez stack live, in the IDE.
ArchVizPane.tsx(706 lines): a live architecture visualization — pixel-art boxes for globe/phone/bolt, with dots flowing along the paths between components and labels flashing as events fire, driven by anarchVizEventsbus. You watch a mutation travel client → on-zero → PGlite → trigger → replication → zero-cache → poke in real time.OrezChangesPane.tsx: a live tail of the_orez.changestable (watermark, table, op, row/old data) — the replication change log as it's written.ZeroDebugPane.tsx(302 lines): worker logs, status, a PGlite ping, and reset/restart actions for the in-browser zero-cache.
OrezZeroPane.tsx is a legacy stub kept only for layout compatibility (its job
moved to ZeroDebugPane).
| orez-node | orez-web | orez-cloudflare | |
|---|---|---|---|
| Calendar time | ~4 months (Feb–May 2026), the foundation | ~10-week focused build (from Mar 20) | ~2-week burst (DO core in ~7 days late May) + ongoing soot deploy churn |
| Hardest sub-problem | faked logical replication; cross-process WASM SQLite SHM (never solved, worked around) | build-time surgery on zero's compiled output; the view-syncer timing patch | PG-wire→SQLite translation with no PGlite; two split-brain failure modes |
| Churn hotspots | handler.ts (65), pg-proxy.ts (61) | pg-proxy-browser.ts (82), handler.ts (65) | 11 scratch dirs in 2h; cloudflareDoDeploy.ts (20) |
| Relative difficulty | hard | harder | hardest |
| orez-node | orez-web | orez-cloudflare | |
|---|---|---|---|
| Coupling to zero internals | high (disk-rewrite patches, path imports) | high (build-time needle patches, view-syncer logic) | highest (5 overlay patches + ~30 stubs + workerd) |
| Runtime fragility | WASM SQLite SHM (mitigated) | COOP/COEP requirement; opaque bedrock binary | 128MB ceiling; workerd updates |
| Breaks on a Zero bump? | likely (caught loudly) | likely (build fails loudly) | likely (already happened on 1.5) |
| Guardrails | litestream-anchor throw; perf baselines | needle-not-found throws; 11-scenario test | version-pin guard test; runtime validators + flicker detector |
| orez-node | orez-web | orez-cloudflare | |
|---|---|---|---|
| Intended use | local dev | in-IDE dev | production |
| End-to-end working? | yes (chat e2e, demos) | yes (11 scenarios, dual-surface) | yes (todo/app/flights runtime-validated) |
| Notable gaps | dev-only by design (mutex, no HA); pg→sqlite compiler incomplete | no durable replica; admin/planner disabled; legacy proxy.worker | serialization point; large-working-set risk |
A Zero version bump is the single most likely thing to break orez. In order:
- Bump
@rocicorp/zeroin~/orezdevDeps and runbun run check:allthere. The litestream patch and sqlite3 shim throw loudly on signature drift — fix the anchors. - orez-web: rebuild via
bun scripts/build-orez.ts. Thebuild-zero-cache.tsneedle patches throw "patch failed: needle not found" ifprocesses.js, the worker auto-run guards,write-worker-client.js, orview-syncer.jschanged shape. Re-derive each needle. Then runtest:orez(the 11-scenario browser suite). - orez-cloudflare: the five
cf-patches.tspatches warn (don't throw) on drift — read the logs. Runbun scripts/dev/test-cf-do-bundle.ts(bundles the overlay without deploying), then the template deploy validators with--runtime. - Re-pin: publish a new
orez, bump soot'spackage.jsonpin, and confirm the guard test (test/cloudflare-do-deploy.test.ts) still finds the barrier primitives in the installed dist.bun checkincludes the alignment check across root / templates /orez-sprites/bun.lock. - Never trust a green reload for the CF path — force the
SchemaVersionNotSupportedpath (navigate to a route needing a new table, poll ~20s) before declaring the deploy healthy.
- node:
index.ts:258(startZeroLite),pglite-manager.ts:265(3 instances),pg-proxy.ts:152-201(query rewrites),pglite-ipc.ts(ArrayBuffer-transfer IPC),recovery.ts,zero-litestream-patch.ts:77-96,s3-local.ts - replication (shared):
replication/handler.ts:710-932(event-driven loop),replication/pgoutput-encoder.ts,replication/change-tracker.ts,change-tracking.ts - embeds:
worker/zero-cache-embed.ts(node),worker/browser-embed.ts(startZeroCacheEmbedBrowser),worker/zero-cache-embed-cf.ts:120(startZeroCacheEmbedCF) - browser:
pg-proxy-browser.ts,worker/shims/postgres-socket.ts,worker/shims/node-stub.ts,worker/browser-build-config.ts - cloudflare:
pg-proxy-do-backend.ts:4870(DoBackend),cf-do/worker.ts:132(ZeroDO/ZeroSqlDO),cf-do/watermark.ts:21,do-sql-tracking.ts,worker/cf-patches.ts:61,cf-do/ARCHITECTURE.md,cf-do/CHAT_E2E.md - sqlite/compiler:
sqlite-mode/apply-mode.ts,pg-sqlite-compiler/index.ts:38,sqlite-wasm/Makefile
- orez-web:
packages/orez-web/src/index.ts:203(startOrezWeb),workers/{zero-cache,pg-proxy,pglite}.worker.ts,workers/sab-channel.ts:29(SAB ring),scripts/build-zero-cache.ts:55(the esbuild surgery),scripts/build-workers.ts,shims/bedrock-sqlite-stub.ts,shims/postgres-browser.ts,lib/browser-mode.ts - cloudflare deploy:
src/deploy/cloudflareDoDeploy.ts—:73-319shim source (ZeroCacheDO + routing + sealed endpoints),:208-265ensureReady,:266-294resetReplicaIfTableSetChanged,:485-658migrations/ensurePublication/applyInitSqlDDL,:982-1025bundler,:1175deployToCloudflareDo;src/deploy/buildUserApp.ts:256-268;src/deploy/buildTrigger.ts(cloudflareDo: true) - per-project wiring:
src/worker/projectRuntime.ts:822(startOrezWebcall),src/worker/orez-web-client.ts,src/worker/project-server.worker.ts,src/worker/seed-project-data.ts - build & validators:
scripts/build-orez.ts,scripts/dev/validate-cf-do-runtime.ts,scripts/dev/test-cf-do-deploy.ts,scripts/dev/deploy-prod-project.ts,scripts/dev/test-cf-do-bundle.ts,test/cloudflare-do-deploy.test.ts:197,test/orez-web-sync.test.ts - observability & prod:
src/tiling/components/{ArchVizPane,OrezChangesPane,ZeroDebugPane}.tsx,src/production/{deployTier,orezLogUrl,HealthPane}.ts - legacy tier:
packages/orez-sprites/src/{provision,destroy,status}.ts(fly sprites — replaced by CF DO) - canonical flows:
docs/architecture.md:40-176(the 18-step in-browser sync flow + worker table)