08 — Development Guidelines

These are the rules of the road for everyone (humans and agents) writing code in this repository. They are deliberately tight; if a guideline doesn't fit a situation, the answer is to discuss and amend the guideline rather than ignore it.

Toolchain

Tool	Version / channel	Notes
Rust	stable, 1.95.0 via rustup 1.29	pinned in `rust-toolchain.toml`
cargo-nextest	latest stable	mandatory for running tests
TypeScript	6.x	strict mode
Node	LTS	for SvelteKit dev server only
oxlint	latest	lints TS
oxfmt	latest	formats TS
vitest	latest	TS tests
LocalStack	pinned (version declared in CI config)	for AWS-shaped integration tests
jujutsu (`jj`)	latest	version control (see below)

Packer and Terraform are not part of the MVP toolchain. Infrastructure-as-code lives as Rust crates in this workspace (see 09-infrastructure-module.md); custom AMI baking (Packer) returns post-MVP for reproducible agent VM images.

rustfmt (default channel) and clippy --all-targets --all-features run in CI and as a pre-push hook.

Tiger Style — the pervasive style

project adopts TigerBeetle's TigerStyle as its pervasive coding style, adapted to Rust and TypeScript. This is not a recommendation; it is the default. Deviations require a written reason in the PR description.

The short form: be defensive and validate everything. Assume any input you did not produce is wrong. Assume any invariant you did not assert can be violated. Make every limit explicit, every error handled, every assumption checked.

The Tiger Style design priorities — safety, performance, developer experience, in that order — apply here. When the three pull in different directions, safety wins.

A few load-bearing principles, restated in our context:

Zero technical debt. Do it right the first time. The "second time" often does not arrive, and shipping a sound foundation is the only sustainable rate of progress.
Simple, explicit control flow. No recursion (use iteration with an explicit bound). No clever combinators that hide branches. Linear-flow match over chained ? when the chain hides a non-trivial control structure.
Limits on everything. Every loop, every queue, every retry, every cache, every payload size has an explicit, declared upper bound. An unbounded loop must be assert!-bounded by an invariant in the body.
Assertions are first-class code. They detect programmer errors. The only correct response to a violated assertion is to crash. Aim for an average of at least two assertions per function (preconditions, postconditions, invariants); see the Defensive coding section.
Always say why. Comments and commit descriptions explain the rationale, not the action. The action is in the code.

The full Tiger Style document is required reading. Read it at least once. Re-read it when you find yourself reaching for an exception.

Defensive coding and assertions

Where to validate

Boundary	What to validate	How
HTTP request → handler	Body shape, sizes, IDs, enums	Schema validation against `canonical-types.schema.json` (or its derived Rust types) before the handler sees the data
Adapter → core	Domain invariants	`assert!` at the top of every core function on its preconditions
Core → adapter	Adapter contract	`assert!` on adapter return shapes (e.g. "DynamoDB returned exactly the keys we asked for")
Disk / S3 read	Round-trip integrity	Schema validation on read, even if we wrote the same shape — pair the assertion with the write site
Wire decode (WS frames)	Frame discriminant + payload shape	Reject unknown frames; do not "best-effort" parse
External API response	Status, content-type, body shape	Treat third parties as adversarial; never `serde_json::from_slice` and trust the result

Assertions in Rust

Use assert!, debug_assert!, and assert_eq! liberally in core code. Production builds run with assert! enabled (no --release-only assertions for invariants).
Average two or more assert!s per function in cores. Preconditions on entry, postconditions on exit, invariants in the middle. Empty assertions (assert!(true)) are not counted.
Pair assertions. For every property worth enforcing, find at least two independent code paths to enforce it. Example: assert manifest validity at write time and re-validate at every read site that mutates state from it.
Assert positive AND negative space. What you expect, and what you do not expect. The boundary between the two is where the interesting bugs live.
Compile-time assertions for size and layout invariants: static_assert!-style checks (const _: () = assert!(SIZE_OF_FOO == 32);) for any layout the codebase relies on.
Split compound assertions. assert!(a); assert!(b); over assert!(a && b); — failures point at the actual broken condition.
Single-line implications. if a { assert!(b); } reads as "a implies b".
No unwrap() / expect() in production code paths. Tests, init-only code, and one-off scripts may use them; init-time expect() requires an explicit reason string.
No panic! for control flow. Panics signal programmer error only.

Assertions in TypeScript

Use a small invariant(condition, message) helper that throws on false; treat it the same way as Rust assert!. A separate assertNever(x: never) for exhaustive switch.
Validate all inbound data with a schema validator (Valibot or Zod against the generated types from canonical-types.schema.json) at the boundary; never as Foo cast network data.
Strict TypeScript settings (already specified): strict, noUncheckedIndexedAccess, exactOptionalPropertyTypes. No any.

Errors are data, not exceptions

Every error is a value with a typed reason. The HTTP shim translates them into stable error codes that survive client upgrades.
Every error must be handled or explicitly propagated. Swallowing an error is a bug.
Retry policies are explicit and bounded (max attempts, backoff schedule, jitter).
Never log a secret. Errors that carry data must scrub anything that could be a credential.

Make invalid states unrepresentable

Use the type system. Id<Tag> newtypes (already in the architecture) catch wrong-id-type bugs at compile time. The same pattern for Cents, Seconds, Bytes, etc.
Enums for state, not strings. The InstanceState enum is the canonical example — match exhaustively, no fallthrough.
NonEmpty<T> for collections that must have at least one element. Pre-validated string newtypes (Email, WgPublicKey, Pem) for any string with structure.

Limits and bounds

Every limit is declared as a const in the relevant crate, named with units, and referenced everywhere it applies. No magic numbers.

Mandatory limits at MVP (initial values are placeholders to be tuned; the existence of the limit is non-negotiable):

Domain	Limit	Initial value
HTTP request body	`HTTP_BODY_BYTES_MAX`	1 MiB
HTTP header bytes	`HTTP_HEADER_BYTES_MAX`	64 KiB
WS frame payload	`WS_FRAME_BYTES_MAX`	1 MiB
Tmux ring buffer per instance	`TMUX_RING_BYTES_MAX`	1 MiB
Active instances per user (concurrency cap)	`INSTANCES_PER_USER_MAX`	10
OrchestratorTask retries	`TASK_ATTEMPTS_MAX`	5
Bootstrap token TTL	`BOOTSTRAP_TOKEN_TTL_SECONDS`	300
WireGuard peer pool size	`WG_PEERS_MAX`	1024
Manifest size	`MANIFEST_BYTES_MAX`	256 KiB
InstanceEvent rate per instance	`INSTANCE_EVENTS_PER_MINUTE_MAX`	600
TrafficLogRecord queue depth	`TRAFFIC_QUEUE_DEPTH_MAX`	100_000
Background task channel depth	`TASK_CHANNEL_DEPTH_MAX`	1024
Labels per instance	`INSTANCE_LABELS_COUNT_MAX`	20
Label key length	`LABEL_KEY_BYTES_MAX`	63
Label value length	`LABEL_VALUE_BYTES_MAX`	200
Bundle upload from client agent	`BUNDLE_BYTES_MAX`	2 GiB

Reaching a limit is an observable event: log structured, increment a counter, and where appropriate emit an InstanceEvent of kind warning. Reaching a hard limit either rejects the input (4xx) or backpressures the producer; it never silently drops.

Version control: jujutsu

We use jujutsu on top of a Git backend. Practical norms:

Commits are small and well-described. Aim for a single coherent change per commit. Squash before pushing if you've been working in many small steps.
Empty descriptions are not accepted. jj describe before pushing.
Conventional Commits for the first line of every commit message: type(scope): subject with types from the standard set (feat, fix, docs, chore, refactor, test, build, ci, perf, style). A commitlint step in the pre-push hook enforces this; CI re-runs it.
Conflict resolution is in jj, not in plain-text markers. Prefer jj resolve workflows.
Branch model. main is the integration branch. Feature work happens on named bookmarks (jj bookmark create feat/x). PRs are pushed via jj git push -b feat/x.
Do not rewrite published history unless the PR is yours and unmerged. If a force-push is required, call it out in the PR.
The .jj/ directory is local; don't commit it. (Already in .gitignore.)

For agents specifically: do not run jj abandon, jj op restore, jj git fetch --force, or any other destructive operation without explicit user confirmation, even if it seems like the cleanest path.

Rust conventions

Formatting and linting

cargo fmt --all clean before commit.
cargo clippy --all-targets --all-features -D warnings clean before commit.
The repository's clippy.toml enables pedantic-adjacent lints; opt-outs require a comment explaining why.

Code style

Hard limit: 70 lines per function. No exceptions. If a function is longer, split it. Extract pure helpers; centralise control flow in the parent ("push ifs up, push fors down").
Hard limit: 100 columns per line. No exceptions. Use rustfmt's max_width = 100.
Modules over files. Prefer many small files over large ones; a 1000-line .rs file is a smell.
No business logic in main.rs or in HTTP handlers. Handlers parse, validate, call into a core function, and serialise the result.
No recursion. Use iteration with an explicit upper bound. The handful of unavoidable cases (parsing recursive data) declare the bound at the entry point and assert it.
Explicit fixed-width integer types (u32, u64, i32, i64) for domain values. Avoid usize/isize for anything that crosses a serialisation boundary.
Errors: one Error enum per crate; thiserror for derivation; From impls translate at boundaries. Adapter errors are translated at the adapter boundary; the core never sees a third-party error type.
No unwrap() / expect() in production code paths. Tests, init-time wiring, and one-off scripts are exempt. Init-time expect() requires an explicit reason string.
No panic! for control flow. Panics indicate programmer error and crash the process.
No unsafe outside an adapter that strictly needs it (none expected at MVP); any unsafe block carries a // SAFETY: comment justifying every invariant the block requires.
Trait objects for ports; &dyn Trait or Arc<dyn Trait>. Generics are fine for hot-path ports if a measurement justifies it.
#[must_use] on Result and on builders.
Simpler return types win. void > bool > u64 > Option<T> > Result<T, E>. Chains of .map().and_then().ok_or() that hide branches are smells; prefer explicit match when the control flow is non-trivial.
Pass large structs by reference. If a parameter is > 16 bytes and not meant to be moved, take &T.
Calculate variables close to their use. Don't introduce locals far from where they're consumed. Don't keep dead bindings around.
No duplicated state. No aliasing of variables. State has one home.
Split compound conditions. if a { if b { ... } } over if a && b { ... } when the conditions check different things; nested if/else makes both branches visible.
State invariants positively. if index < length over if index >= length (when expressing the holding-case).
Brace every if unless it fits entirely on one line.
No comments that explain what the code does. Comments explain why: a non-obvious constraint, a workaround for a specific bug, an invariant a future reader would otherwise miss. Comments are full sentences with capitalisation and punctuation, not scribbles.

Naming

snake_case for functions, variables, modules, files.
CamelCase for types and traits (Rust convention; not Tiger Style verbatim because Rust convention is load-bearing in tooling).
No abbreviations in identifier names. Exceptions: standard short names accepted by the ecosystem (ctx, cfg, id, i/j/k as loop counters).
Acronyms in proper case in CamelCase types: HttpClient, not HTTPClient; OidcProvider, not OIDCProvider. (Matches Rust API guidelines, which align with Tiger Style here.)
Units last in identifiers, sorted by descending significance: latency_ms_max, not max_latency_ms. bytes_max, rows_count, seconds_elapsed. This makes related variables sort and align.
Same-length names for related variables where reasonable: source / target, not src / dst. Aligned source helps the eye spot asymmetry.
Helpers prefix with parent name: read_sector_callback, provision_instance_step_two. Shows call history.
Callbacks go last in parameter lists.
Order matters. A file reads top-down: main first; structs before their methods; fields before nested types before methods inside a struct module.

Testing

cargo nextest run is the only sanctioned way to run tests. cargo test is allowed but nextest is faster and our CI runs nextest.
Test pyramid.
- Unit: in-module tests; pure-function logic; fast (sub-second per test).
- Integration: in tests/ directory of each crate; exercise core + in-memory adapters (from crates/testing). Should still run in seconds.
- End-to-end: in project-server/tests/ or a dedicated e2e crate; spin up the full server with in-memory adapters (or LocalStack for AWS-shaped tests). Marked #[ignore] and run explicitly.
Test exhaustively — positive AND negative space. Every test that confirms a thing works must be paired with a test that confirms the adjacent things fail correctly. "Validates a good manifest" is incomplete without "rejects a bad manifest" with cases for every well-defined failure mode.
Test data crossing the validity boundary. The interesting bugs live exactly there. "1 below the limit", "at the limit", "1 above the limit" — every limit gets these three cases.
Coverage is a floor, not a target. Behaviour coverage via integration tests is what we actually care about. That said, CI enforces a 50% line-coverage floor (cargo llvm-cov on the slow tier) to catch obvious regressions; the floor is a safety net, not a goal to optimise toward. Gaming the floor with trivial tests is a code-review red flag.
No flaky tests. A flaky test is a bug to fix immediately, not a known-issue to retry around.
Property tests (proptest) for state-machine logic (lifecycle transitions, ID parsing, manifest validation). Property tests express invariants; assertions in production code express the same invariants — they are two enforcement paths for the same property (per the "pair assertions" rule).
Test data is built with small builders, not megabyte JSON fixtures.
Determinism. Tests use the Clock and IdGenerator ports' fakes. No Instant::now() or random in test bodies.

Documentation

Public items in library crates carry doc comments.
Cores expose a top-level lib.rs doc that explains what the crate is, the ports it depends on, and the surface it offers.
No bare // TODO without an owner and a tracking link (issue or PR ref).

TypeScript conventions

Tiger Style applies here too. Adapted for the language:

strict: true, noUncheckedIndexedAccess: true, exactOptionalPropertyTypes: true.
oxlint clean before commit; ESLint is not in use.
oxfmt is the formatter.
No any. unknown + narrowing, or a typed parser. Casts are bugs unless justified in a comment.
Domain types are imported from ts/shared-types, never hand-redefined in app code.
Validate at boundaries. Inbound JSON (HTTP responses, WS frames, query params) is parsed through a schema validator (Valibot or Zod) into the typed shape. No JSON.parse(...) as Foo.
No silent fallthrough. switch on a discriminated union ends with assertNever(x) so the compiler enforces exhaustiveness.
invariant(cond, msg) helper for runtime assertions; aim for the same density as Rust (~2 per non-trivial function).
Hard limits on function size and line length match the Rust side: 70 lines, 100 columns. oxfmt enforces line length; function size is a code-review gate.
No abbreviations in identifiers; same naming-by-units rule as Rust where relevant (latencyMsMax, not maxLatencyMs).
vitest for unit and integration tests; Playwright is the deferred choice for browser end-to-end.
Same positive/negative-space testing rule. Every "happy path" test has a paired "rejects a bad input" test.

Svelte conventions

Svelte 5 with runes ($state, $derived, $effect).
A SvelteKit route is a thin wrapper; data fetching uses +page.server.ts load functions.
API calls go through a generated client; no inline fetch('/v1/...').
Components are kept small and accessible; complex behaviour moves to a *.svelte.ts helper module.
shadcn-svelte components are vendored (the shadcn pattern); ad-hoc styling on top of Tailwind utility classes.

Repository hygiene

.private/ is for local, untracked operator data — never commit.
docs/ is the canonical home for specs and decisions.
Operator infra config (e.g. project-infra.toml) lives per-environment under .private/ (untracked) or wherever the operator deploys the project CLI; never commit environment-specific infra config.
Pre-push runs: cargo fmt --check, cargo clippy -D warnings, cargo nextest run, pnpm -r exec oxlint, pnpm -r exec vitest run. Use lefthook or equivalent.
CI runs the same plus cargo nextest run --run-ignored ignored-only (the slow e2e tier) on PRs to main.

Guidelines for AI agents working on this codebase

These are not different rules; they are emphasis on places agents tend to slip.

Tiger Style applies to you too. Read the linked document. Defensive validation and explicit limits are not optional, even on a "small" change.
Add assertions as you go. Every function you touch should leave with at least two assertions in it (preconditions on entry, postconditions on exit, or invariants in the middle). Asserting true doesn't count.
No silent error swallowing. Every Result is handled. Every match on an enum is exhaustive. No _ = thing() in production code.
Stay inside the architecture. Adding I/O directly to a core crate is the most common slip. If a new I/O is needed, define a port in adapters/ports/ first, then implement it as an adapter, then call into it from the core.
Do not add backwards-compat shims. If a type changes, change every caller. The codebase is small; there is no published API.
Do not invent fields not in the canonical types document. Update the schema and regenerate first.
Tests run before claiming complete. "Compiles" is not "works". Run cargo nextest run and the relevant TS tests, and report the actual output.
Test positive and negative space together. A new feature ships with tests for what it accepts and what it rejects.
Limits are explicit. Adding a new loop, queue, retry, cache, or buffer means adding a named constant for its bound, in the same change.
Manifest changes are versioned changes. Editing a Manifest's shape is a schema migration; treat it accordingly (write the migration in the same change).
Prefer small, frequent commits over rolling-up a huge change. jj makes small commits cheap.
No comments that paraphrase the code. Comments explain why, in full sentences. Comments are rare and valuable.
No README files / extra docs unless explicitly requested. The spec lives in docs/.
Ports and types are the contract. When in doubt about a piece of behaviour, pick the option that keeps the port surface small.
When you change a port, you change every adapter that implements it. Don't leave half-migrated adapters.
Do not run destructive jj or git operations without explicit confirmation. This includes jj abandon, force-push, branch deletion, git reset --hard, git clean -fd — even if it looks like the obvious cleanup.
Do not skip pre-commit / pre-push hooks (--no-verify, --no-gpg-sign, etc.). If a hook fails, fix the underlying issue.

Definition of done

A change is "done" when:

The behaviour is exercised by a test (unit, integration, or e2e as appropriate).
The change includes negative-space tests for every new validation path.
Every new or touched function has at least two meaningful assertions (preconditions, postconditions, invariants).
Every new bound (loop iteration count, queue depth, retry count, payload size) is a named const in the relevant crate.
cargo fmt, cargo clippy -D warnings, and cargo nextest run all pass locally.
The relevant TS lint/format/test commands pass locally.
If domain types changed, canonical-types.schema.json is updated and the TS shared types are regenerated.
The commit description states the why.
The PR description lists what changed at the architecture level (which ports, which adapters, which UI surfaces).

Assumptions and open questions

Assumptions

The team is comfortable with jj; if not, falling back to git is acceptable but the rest of the conventions still apply.
LocalStack is sufficient for "AWS-shaped" e2e tests; full AWS integration tests are operator-run, not CI-run.
nextest is universally available on developer machines.

Decisions

Heavy lint/test gate. Pre-push. Pre-commit is friendlier for typo-sized commits; pre-push is faster for iterative jj workflows and keeps commits cheap.
cargo deny / cargo audit in CI. Yes from day 1. Cheap insurance; opting in later is strictly harder.
Generated code lifecycle. Checked in. Generated Rust and TS types live alongside hand-written code and are grep-able; a CI job regenerates and fails the build if the checked-in output drifts.
Tooling language. Rust via xtask. Ad-hoc scripts are Rust subcommands of the xtask crate, not shell. Matches the "no shell in CI" decision in 07-architecture-principles.md.
Assertion density measurement. PR-comment metric, not a CI gate at MVP. The xtask audit reports per-function and per-crate averages in PRs; enforcement stays a code-review norm. Promote to a gate once there's baseline data.
Commit style. Conventional Commits (type(scope): subject) enforced by a commitlint step in the pre-push hook and re-run in CI. See the Version Control section.
Coverage gate. 50% line-coverage floor via cargo llvm-cov on the slow CI tier. Floor, not target — see Testing above.
Proxy hot-path allocation rule. Minimise allocation on the proxy hot path; use pre-sized pools where reasonable. We do not push further at MVP (no ban on Vec::with_capacity-less allocations on hot paths) — revisit once we have real throughput data.

Open questions

(None at this stage.)

antstanley/development_guidelines.md

Select an option

No results found