You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Firecracker ateom Backend — Working PoC on bigbox (counter demo)
Update (2026-05-29): this standalone PoC has since been turned into a full in-repo implementation (Phases 0–3) and a cluster e2e — a counter actor on a Firecracker worker driven through the real control plane (ate-api-server + atenet), state preserved across suspend/resume, on the existing kind cluster. Branch firecracker-backend (pushed to dims/substrate, commit bc533f5; worktree ~/go/src/github.com/agent-substrate/substrate-firecracker). Full journal: ~/notes/agent-substrate/2026-05-29-firecracker-backend-implementation-log.md. The PoC notes below are retained for the from-scratch microVM bring-up details (rootfs build, Firecracker API sequence, gotchas).
Goal: prove a Firecracker backend can satisfy substrate's ateom Run/Checkpoint/Restore contract, preserving in-RAM and filesystem state, driven by the real demos/counter workload.
Result: ✅ PROVEN. A running counter actor was checkpointed, its VM destroyed, and restored into a fresh Firecracker process — the in-memory request counter continued (didn't reset) and the random-file fshash was identical.
Code:~/notes/agent-substrate/firecracker-poc/ateom-firecracker.go (also on bigbox at /root/fc-demo/ateom-fc/main.go).
What was built
A standalone Go program ateom-firecracker implementing the proposal's runtime Backend interface:
typeBackendinterface {
Run(ctx) (workloadIPstring, errerror) // boot microVM from rootfs, report readyCheckpoint(ctx, destDestination) (SnapshotManifest, error) // pause+snapshot, tear down VMRestore(ctx) (workloadIPstring, errerror) // fresh VM from snapshot, resumeDelete(ctx) errorCapabilities() Capabilities
}
It drives the Firecracker HTTP API over its unix socket (boot-source / drives / machine-config / network-interfaces / actions / vm[Paused|Resumed] / snapshot[create|load]), manages the firecracker child process, and owns tap networking (fc-tap0, host 172.16.0.1/24, guest 172.16.0.2, fixed guest MAC) — the backend-owned networking that replaces gVisor's eth0-into-netns dance.
Capabilities() reports SupportsLocalPause=true, SupportsMemorySnapshot=true, RestoreRequiresSameHost=true, SupportsIncremental=false — exactly the signals the control plane needs to pick #119 PAUSED vs SUSPENDED and to gate cross-host scheduling.
The proof (self-test output, exit 0)
Backend=firecracker Firecracker v1.15.1 caps={SupportsLocalPause:true SupportsIncremental:false SupportsMemorySnapshot:true RestoreRequiresSameHost:true}
== Run() == workload ready at 172.16.0.2
== drive counter (in-RAM state) ==
preserved memory count: 2 / 3 / 4 (count 1 consumed by readiness probe)
== Checkpoint(Local) == manifest {Artifacts:[vmstate memory] Backend:firecracker KernelID:vmlinux-6.1.128 ...}
verified: workload unreachable after checkpoint (worker freed)
== Restore() == workload restored at 172.16.0.2
== verify state continuity ==
preserved memory count: 6
PASS ✅ count continued 4 -> 6 across checkpoint/restore (in-RAM state preserved; a reset would show 1-2)
Earlier shell-level run also confirmed the filesystem dimension: fshash before snapshot = after restore (HdCdyLcPQbNG4g/k82Dkk…), i.e. the 1 MB /random-content-file survived (rootfs disk reused in place for PAUSED/same-node restore).
Snapshot artifacts (Full): memory 256 MiB + vmstate 14 KiB — note the full-RAM memory file, which is the SUSPENDED/durable NIC-cost concern from the proposal (PAUSED keeps it local → fast UFFD/CoW resume).
Reproduction (on bigbox, all under /root/fc-demo)
Prereqs already in place: firecracker, jailer, vmlinux (Firecracker CI v1.12), static busybox (busybox-static), Go 1.26.1, /dev/kvm, /dev/net/tun.
1. Build the counter (static):
CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build -ldflags="-s -w" -o counter counter.go # from demos/counter
Gotcha that cost a debugging cycle: busybox --list includes busybox itself; symlinking it (ln -sf busybox bin/busybox) makes a self-referential symlink → kernel init fails with ELOOP (-40). Exclude it.
3. Build & run the backend self-test:
cd ateom-fc && go mod init ateom-firecracker && go build -o ../ateom-firecracker .cd .. && ./ateom-firecracker -workdir /root/fc-demo
Boot args used: console=ttyS0 reboot=k panic=1 pci=off root=/dev/vda rw init=/init. Networking: host curls the guest directly on the tap subnet (counter has no outbound, so no NAT needed).
Scope of this standalone PoC — and how each gap was later closed
This PoC validated the hard, novel part — the runtime mechanics — standalone. The gaps it left are listed here for honesty, but all were subsequently addressed in the in-repo implementation (Phases 0–3) and the cluster e2e — see the proposal's "As-Built" section and the implementation log.
Not wired into substrate (at PoC time): no gRPC Ateom server, no atelet, no control plane, no CRDs. → Closed: Phase 2 landed cmd/ateom-firecracker as a real gRPC Ateom server, and the cluster e2e drove a counter actor through the real ate-api-server + atenet.
OCI → rootfs hand-rolled (busybox + static counter binary + ext4; bespoke rootfs, not the ko image). → Partly closed: the cluster e2e builds the ext4 from the actual ko counter image (via the rootfs atelet extracts). The mkfs.ext4 is still hand-rolled — the production firecracker-containerd + devmapper path remains future (proposal §7.1).
Durable SUSPENDED not implemented. → Closed: Phase 3 uploads {vmstate, memory, rootfs} via internal/ategcs and restores from object storage (TestFirecrackerAteomDurable).
Same-node restore only. → Partly closed: Phase 3's durable test restores on a fresh fcService/workdir (a simulated different node) by pulling from object storage. Cross-CPU/kernel portability + capability-aware scheduling remain future (proposal §6.3, §7.6).
No jailer / device-plugin hardening. → Still open: the cluster e2e ran firecracker in a privileged pod (the node already exposes /dev/kvm); the KVM device-plugin + jailer hardening remains the recommended production shape (proposal §6.2, §7.3).
Follow-on increments — now done (see the proposal & log)
The increments this PoC originally suggested have since been implemented on branch firecracker-backend:
✅ Phase 1 (#121):RuntimeConfig oneof on the ateom proto, populated responses, GetCapabilities. (atelet.proto was left for the "proper" wiring path — proposal §6.)
✅ Phase 2 + 3 + cluster e2e:cmd/ateom-firecracker gRPC Ateom server (LOCAL + durable), and a counter actor on a Firecracker worker through the real control plane. The firecracker-containerd/devmapper rootfs + KVM device-plugin pod shaping remain the recommended production hardening (proposal §6.2, §7.1).
The standalone PoC artifacts are left on bigbox under /root/fc-demo/ for re-runs.
Status: ✅ IMPLEMENTED & PROVEN (2026-05-29). All phases (0–3) plus a full cluster e2e are done on branch firecracker-backend — pushed to dims/substrate (commit bc533f5, GPG-signed); worktree ~/go/src/github.com/agent-substrate/substrate-firecracker. A counter actor runs on a Firecracker microVM through the real ate-api-server + atenet, with in-RAM state preserved across suspend/resume; the gVisor helpdesk demo was untouched. Implementation journal: ~/notes/agent-substrate/2026-05-29-firecracker-backend-implementation-log.md. Builds on #121, relates to #119, #23.
Scope note: Kata Containers was evaluated and dropped — upstream Kata has no usable checkpoint/restore (see §13). This proposal adds exactly one new backend: Firecracker.
Method: Multi-agent code+web deep-dive (5 agents read the live ateom/atelet/control-plane/CRD/pod-deployment code with file:line citations; web research on Firecracker against primary sources). Load-bearing claims re-verified by hand. Firecracker feasibility confirmed by booting a microVM on bigbox (nested KVM).
0. TL;DR
Substrate's runtime layer (ateom) is gVisor-only today, but the proto comment, the cmd/ateom-gvisor naming, and the roadmap (priority #6: "Runtime modularity… gVisor, microVMs") all anticipate alternatives. This proposal:
Defines a pluggable backend seam so gVisor and Firecracker are interchangeable from the control plane's point of view, selected declaratively per WorkerPool.
Lands a Firecracker backend: a real snapshot/restore-capable microVM runtime. Strong VM isolation + fast local resume (CoW/UFFD), mapping cleanly onto substrate's suspend/resume spine and #119's PAUSED/SUSPENDED tiers.
The work is mostly additive and concentrates gVisor-specific logic behind three seams: a Go Backend interface inside ateom, a RuntimeConfig oneof in the protos (additive — ateom field 7, atelet field 9, both currently free), and a Backend selector on the WorkerPool CRD that drives backend-specific pod shaping. The atelet storage mover and the ategcs object-storage interface are already backend-agnostic and are reused unchanged. Existing gVisor deployments keep working untouched (default backend: gvisor).
The one architectural truth that shapes everything: gVisor and Firecracker are opposite on cost/portability. gVisor snapshots are small, compressible (memory+sentry+fs-deltas) and restore anywhere. Firecracker snapshots are full guest RAM + full disk and only restore on a host with the same VMM version, same kernel, and a compatible CPU. The backend interface must therefore carry capabilities and snapshot provenance, and the scheduler must become capability-aware. This is new surface area, not a drop-in — and it's why the recommendation is PAUSED-first (warm, local, same-node resume); durable SUSPENDED is also implemented + proven (Phase 3 — see As-Built), but for heavy-RAM actors it's gated on a snapshot-size story.
As-Built — what shipped vs. what's designed-but-not-wired (added 2026-05-29)
This doc was written as a forward design; below is what the implementation actually shipped, since the cluster e2e took a deliberate shortcut that diverges from §6. (Branch firecracker-backend, commit bc533f5, pushed to dims/substrate; worktree …/agent-substrate/substrate-firecracker. Chronology in the implementation log.)
Two layers, built two ways:
In-repo backend (Phases 0–3) — matches the design (§4–§7).
cmd/ateom-firecracker: a real gRPC Ateom server driving Firecracker (Run/Checkpoint/Restore/GetCapabilities), durable SUSPENDED via internal/ategcs (§7, Phase 3).
Proven by in-repo integration tests TestFirecrackerAteomGRPC (LOCAL) and TestFirecrackerAteomDurable (S3/minio round-trip on a fresh "node"), driving ateom-firecracker through the generated ateompb client.
Cluster e2e — a zero-touch shortcut that DIVERGES from §6. To prove the end-to-end path on the existing cluster without modifying its running control plane, cmd/ateom-firecracker/cluster.go adds a "cluster mode" (used when the unmodified atelet passes no MicroVMParams): it derives the rootfs + entrypoint from the hostPath atelet already populates (bundles/<c>/rootfs + config.json), builds an ext4 (busybox + an /init that nets up + execs the entrypoint), boots the microVM with the baked-in kernel/firecracker, DNATs pod-IP:80 → guest:80 so atenet routing reaches the guest, and maps the snapshot onto the files atelet already ships: checkpoint.img = tar{vmstate, rootfs.ext4}, pages.img = memory, pages_meta.img = placeholder. Net: a counter actor runs on a Firecracker worker through the real ate-api-server + atenet (resume-on-traffic + kubectl ate suspend), state preserved across suspend/resume, with zero changes to atelet / ate-api-server / CRD / proto.
Designed but NOT wired into the cluster (the "proper" path, §6): the atelet/ate-api-server RuntimeConfig plumbing, an ActorTemplate runtime field, and an OCI→ext4 builder in atelet. Cluster mode is the pragmatic stand-in; the §6 wiring remains the recommended production shape (it drops the "pack the disk into checkpoint.img" hack and the per-pod fixed guest IP).
Snapshot layout & size — Firecracker vs gVisor
Structural difference (the core answer):
gVisor (ateom-gvisor)
Firecracker (ateom-firecracker)
Captured
process memory working set + sentry state + filesystem deltas
full configured guest RAM + VM device state + full rootfs disk
Rootfs in snapshot?
No — rebuilt from the pinned OCI image on restore
Yes (as-built cluster mode packs it into checkpoint.img)
Scales with
memory actually touched + fs changes
mem_size_mib (even unused RAM) + disk size
Restore portability
any node
same VMM version + kernel + compatible CPU
Files
checkpoint.img (+ pages.img, pages_meta.img)
vmstate (KB) + memory (= guest RAM) + rootfs ext4
A Firecracker snapshot is fundamentally larger: it captures the whole VM (RAM + disk) vs gVisor's process working set + deltas. The gap is modest when compressed for a tiny/idle workload (a 256 MiB VM running a small counter is mostly zero pages → zstd crushes them), but it grows with configured RAM and real memory use, and the rootfs adds to it.
Measured — same counter workload, snapshot pushed to object storage (zstd):
Firecracker counter (measured, from the durable / Phase-3 run):memory ≈ 13 MiB (zstd of 256 MiB guest RAM), rootfs.ext4 ≈ 5.7 MiB (zstd of a 512 MiB sparse ext4), vmstate ≈ 2 KiB → ≈ 19 MiB compressed. Uncompressed on the node: 256 MiB guest RAM + the rootfs ext4 (512 MiB sparse in this run; the integrated cluster mode mkfs's 256 MiB) + ~14 KiB vmstate — i.e. ~0.5–0.75 GiB allocated, mostly zeros (hence the high compression).
gVisor counter (estimate — not measured apples-to-apples; the comparison deploy was stopped): gVisor's checkpoint.img for a tiny idle counter is the small touched working set + sentry + fs deltas, compressed → low single-digit MiB, with no rootfs and no unused RAM. An exact number needs a gVisor counter snapshot (I can produce one on request).
Mitigations (also §7.5): keep mem_size_mib tight; balloon-inflate before snapshot to drop page cache; Firecracker diff/incremental snapshots (dev-preview) to ship only dirtied pages; separable homedir so image upgrades don't force re-shipping rootfs+memory. This is why the recommendation is PAUSED-first (keep the big memory file node-local, never hit the network), with durable SUSPENDED gated on a size story.
1. Goals / Non-Goals
Goals
A backend seam such that gVisor and Firecracker are interchangeable, selected declaratively. (Designed to admit future backends, but only Firecracker is implemented here.)
Land Firecracker as a first-class suspend/resume backend, PAUSED-first.
Keep all proto/CRD changes additive and backward-compatible; default backend: gvisor, existing deployments untouched.
Make the implicit Run/Checkpoint/Restore contract explicit (issue #121 Phase 2), including what each backend stores vs reconstructs.
Non-Goals (this proposal)
Kata Containers (dropped — §13). CRIU (deferred — §13).
Implementing the #119 state machine itself (PAUSED/CRASHED states). We define the backend hooks #119 needs and align with it; #119 is its own work.
GPU/PCI passthrough for microVMs (Firecracker has none; out of scope).
Cross-CPU-vendor or cross-kernel Firecracker snapshot portability (constrained by the VMM; we design around it).
2. Current Architecture (ground truth)
2.1 Topology — two privileged pods + a shared host directory
There is no single "worker pod." Two components cooperate through a shared hostPath:
ateom is a Deployment (internal/controllers/workerpool_controller.go:121-173): one container "ateom", WithImage(wp.Spec.AteomImage), privileged:true, runAsUser/Group:0, no devices, no resource requests/limits, no seccomp/AppArmor, one volume mount: hostPath /run/ateom-gvisor.
atelet is a DaemonSet (manifests/ate-install/atelet.yaml:46-98): privileged:true, hostPort 8085, ATE_STORAGE_BACKEND=gcs, same hostPath. RBAC: pods get/watch/list.
They communicate over a unix socket on the shared hostPath, not the network and not a shared netns (cmd/atelet/main.go:549-553).
Control plane → atelet: AteletDialer.DialForWorker resolves the atelet on the worker's node via a byNode index on Spec.NodeName (dialer.go:49-90). Node-locality is already structural — important for #119's "PAUSED prefers original node."
2.2 The runtime-config threading chain (where gVisor leaks in)
ActorTemplate.Spec.Runsc (CRD, required, actortemplate_types.go:89-92,121-134) → translated to ateletpb.RunscConfig in two byte-identical ~18-line blocks (workflow_resume.go:193-210, workflow_suspend.go:119-136) → atelet fetchRunsc downloads+sha256-verifies the binary, yielding a local path (cmd/atelet/main.go:196-262) → passed as ateompb.*.RunscPath (field 4) → ateom builds &runsc{path:…} and shells out (ateom-gvisor/main.go:273,320,451; runsc.go).
2.3 Run / Checkpoint / Restore as actually implemented (the real contract)
RunWorkload (ateom-gvisor/main.go:210-306): move pod eth0 into an interior netns + AF_PACKET; runsc create+start the pause (sandbox-root) container; then each app container sharing the same -root state dir. -allow-connected-on-save set at start (runsc.go:86).
CheckpointWorkload (main.go:308-382): runsc checkpoint the pause container only → checkpoint.img (+ optional pages.img, pages_meta.img); runsc delete -force all; return eth0. ateom does not upload/wipe — atelet uploads the (up to) 3 files zstd-compressed (main.go:341-359) then resetActorDirs wipes local (main.go:361,563-599).
RestoreWorkload (main.go:384-486): atelet downloads the 3 files and rebuilds the rootfs from the OCI image again (prepareOCIBundles, main.go:399-403, oci.go:43-62); then runsc create+restore per container from the one image-path.
Critical: the snapshot is memory + sentry + filesystem deltas only; the rootfs is reconstructed from the (digest-pinned) OCI image on every run and restore. Hence images must be @-pinned ("changing the image invalidates snapshots", actortemplate_types.go:41,72). Issue #121's "captures full state (process + filesystem)" is an idealization — the truth is more nuanced, which is exactly why an explicit contract is needed.
2.4 What's empty / missing
All three ateom + all three atelet responses are empty (ateom.proto:69,92,109; atelet.proto:91,119,139). No ready, no workload IP, no snapshot manifest. "Ready" today = "the unary RPC returned" — async VM boot/restore has no slot to report progress.
No capability negotiation. No way for a backend to advertise "I support local PAUSE" / "my snapshots only restore on matching CPUs."
No PAUSED / local-retention path. Every checkpoint uploads to durable storage and wipes local.
Actor.Status = UNSPECIFIED/RESUMING/RUNNING/SUSPENDING/SUSPENDED (pkg/proto/ateapipb/ateapi.proto:58-64) — no PAUSED/CRASHED, no backend field, no snapshotConfig.
2.5 gVisor coupling map (where the work is)
Layer
gVisor-coupled element
Evidence
Disposition
Proto (ateom)
runsc_path field 4 ×3
ateom.proto:55,77,101
→ RuntimeConfig oneof
Proto (atelet)
RunscConfig runsc ×3 (fields 8/6/6)
atelet.proto:43,102,131
→ RuntimeConfig oneof
CRD
ActorTemplate.Runscrequired, per-arch SHA+URL
actortemplate_types.go:89-92
make oneof; not required
CRD
WorkerPool = only Replicas + AteomImage
workerpool_types.go:21-30
add Backend + pod shape
ateom impl
shell out to runsc {create,start,checkpoint,restore,delete,state}
ateom-gvisor/runsc.go
behind Backend iface
ateom impl
pause-container=sandbox; checkpoint root only; restore per-container from one image-path
hardcoded snapshot file set checkpoint.img/pages*.img
main.go:341-394; ateompath.go:129-148
replace w/ backend manifest
atelet
resetActorDirs wipe-after-upload
main.go:563-599
needs "keep local" (PAUSED) mode
pod
privileged, no /dev/kvm, no /dev/net/tun, no resources
workerpool_controller.go:138-172
add devices for VMs
storage
ategcs.ObjectStorage (GCS/S3 + zstd)
internal/ategcs/ategcs.go:35-91
already generic — reuse
3. Design Principle
atelet = backend-agnostic storage-mover + pod-plumbing. ateom-<backend> = the runtime driver. Backend choice is a per-WorkerPool decision (pools are homogeneous); per-actor runtime config travels in a oneof that must match the pool's backend. The snapshot is opaque bytes + a backend-authored manifest; the control plane never parses it.
Two consequences:
Sibling binaries. Keep cmd/ateom-gvisor; add cmd/ateom-firecracker. Build deps (Firecracker SDK, KVM, CNI) and pod shaping differ, and the naming already anticipates this. Inside each, a small Go Backend interface (the 6 verbs from runsc.go) keeps it testable. WorkerPool.AteomImage already selects the binary; we add a Backend enum so the controller knows how to shape the pod.
Capabilities + provenance become first-class. Because gVisor and Firecracker differ on PAUSE cost, incremental snapshots, and restore portability, the control plane must query capabilities and record snapshot provenance (backend, VMM version, kernel, CPU template, image digest) to validate restores and drive #119's "devolution."
4. Backend Capability Model
Capability
gVisor
Firecracker
Isolation
syscall interception (userspace kernel)
hardware VM
Needs /dev/kvm
No
Yes
Snapshot/restore with memory
Yes (small, compressible deltas)
Yes (full RAM + disk; CoW/UFFD restore)
Local PAUSE (no NIC)
Yes (local checkpoint dir)
Yes (mem file on local disk; ideal for UFFD)
Incremental snapshot
No
Diff snapshots (dev-preview)
Restore portability
Any node
Same VMM ver + kernel + compatible CPU (CPU templates; no Intel↔AMD)
Snapshot size
small (deltas, zstd)
full guest RAM (heavy)
Fast cold start
golden snapshot
snapshot or boot
GPU passthrough
n/a
No
This is the proposal's backbone: gVisor stays the default general-purpose backend; Firecracker is for strong isolation + warm local resume on homogeneous pools. It implies a new ateom RPC: GetCapabilities.
Map to #119 snapshot modes:
None (clean restart each activation): both backends.
PAUSED (local snapshot, fast resume, low durability): both (backends advertising supports_local_pause).
SUSPENDED (durable snapshot): both. For Firecracker this is the expensive path (full RAM upload) — see §7.5.
5. Proposed Changes — Proto
All additive; old fields deprecated, never renumbered; reserved only after removal.
5.1 internal/proto/ateompb/ateom.proto
serviceAteom {
rpcRunWorkload(RunWorkloadRequest) returns (RunWorkloadResponse) {}
rpcCheckpointWorkload(CheckpointWorkloadRequest) returns (CheckpointWorkloadResponse) {}
rpcRestoreWorkload(RestoreWorkloadRequest) returns (RestoreWorkloadResponse) {}
rpcGetCapabilities(GetCapabilitiesRequest) returns (Capabilities) {} // NEW
}
messageRuntimeConfig { // NEW (oneof is extensible for future backends)oneofbackend {
GVisorParamsgvisor=1;
MicroVMParamsmicrovm=2; // Firecracker
}
}
messageGVisorParams { stringrunsc_path=1; } // resolved local pathmessageMicroVMParams {
stringvmm_binary_path=1; // firecracker/jailerstringkernel_image_path=2; // vmlinuxstringrootfs_image_path=3; // ext4/devmapper devicestringkernel_cmdline=4;
uint32vcpu_count=5;
uint32mem_size_mib=6;
stringcpu_template=7; // e.g. T2, T2A — restore portabilityTapNetworkConfignetwork=8;
}
messageRunWorkloadRequest {
stringactor_template_namespace=1;
stringactor_template_name=2;
stringactor_id=3;
stringrunsc_path=4 [deprecated = true]; // legacyWorkloadSpecspec=5;
RuntimeConfigruntime=7; // NEW — field 7 free in ALL THREE requests (uniform)
}
// CheckpointWorkloadRequest / RestoreWorkloadRequest: identical addition of `RuntimeConfig runtime = 7;`// (their field 6 is snapshot_uri_prefix; 7 is free — verified). Add to Checkpoint:// enum Destination { DURABLE = 0; LOCAL = 1; } Destination destination = 8; // PAUSED vs SUSPENDEDmessageRunWorkloadResponse { boolready=1; stringworkload_ip=2; } // was emptymessageRestoreWorkloadResponse { boolready=1; stringworkload_ip=2; } // was emptymessageCheckpointWorkloadResponse { SnapshotManifestmanifest=1; } // was emptymessageSnapshotManifest { // NEW — replaces hardcoded filenames in ateletrepeatedstringartifact_names=1; // ["vmstate","memory","rootfs.ext4"] | ["checkpoint.img","pages.img"]stringbackend=2;
stringvmm_version=3;
stringkernel_id=4;
stringcpu_template=5;
map<string,string> provenance=6; // image digest, fc/runsc version (for #119 devolution)
}
messageCapabilities { // NEWboolsupports_local_pause=1;
boolsupports_incremental=2;
boolsupports_memory_snapshot=3;
boolrestore_requires_same_host=4; // true for Firecrackerstringsnapshot_portability_class=5; // CPU template / "any"
}
5.2 internal/proto/ateletpb/atelet.proto
Same shape: replace RunscConfig runsc with RuntimeConfig runtime = 9; on all three requests (field 9 free in all three — verified). atelet's RuntimeConfig.gvisor wraps the existing RunscConfig (the fetch spec: url+sha256+arch+auth); microvm carries its artifact-fetch spec (kernel URL+hash, rootfs builder config, VMM binary URL+hash). Populate the three responses with {ready, workload_ip, SnapshotManifest}.
Why both layers, and all six requests: atelet's config is the fetch spec (download kernel/VMM/runsc); ateom's is the resolved local paths. Both currently carry gVisor specifics on all three RPCs (checkpoint and restore need the binary too), so the oneof must be added to all six request messages — not just Run (issue #121 Phase 2 understates this).
6. Proposed Changes — CRD, Controller, Control Plane, atelet, ateom
6.1 CRD (pkg/api/v1alpha1)
WorkerPoolSpec.Backend (new): +kubebuilder:validation:Enum=gvisor;firecracker, +kubebuilder:default=gvisor. Default keeps every existing manifest valid. Add CPU/memory shape fields the docs already promise (architecture.md:226-228) but which don't exist — Firecracker needs explicit mem_size_mib/vcpu_count.
ActorTemplateSpec: introduce RuntimeConfig (oneof gvisor:{RunscConfig} | microvm:{…}), and drop the required on Runsc (today a microVM template literally cannot validate). A CEL/controller check enforces the active arm matches the referenced pool's Backend.
The single seam for pod shaping is buildDeploymentApplyConfig. Wrap it in switch wp.Spec.Backend:
gvisor (default): today's body verbatim (privileged + hostPath /run/ateom-gvisor).
firecracker: request /dev/kvm + /dev/net/tun via a KVM/TUN device plugin (devices.kubevirt.io/kvm) — preferred over blanket-privileged — add CAP_NET_ADMIN, a Firecracker seccomp profile, and resource requests/limits (a microVM must reserve its RAM).
6.3 Control plane (cmd/ateapi/internal/controlapi)
De-duplicate the two identical runsc-translation blocks into one backend-switched helper that fills RuntimeConfig (this is also where #121 Phase 1 lands).
Thread the backend: stamp backend onto the Actor/Worker record at assignment (AssignWorkerStep, workflow_resume.go:87) so suspend doesn't need to re-load the pool.
Capability-aware scheduling (new constraint):findFreeWorker (workflow_resume.go:142-157) is random today. For Firecracker, restore requires a host with matching VMM/kernel/CPU; the scheduler must (a) keep pools homogeneous and (b) for SUSPENDED→resume, only pick workers whose host is compatible with the snapshot's recorded cpu_template/kernel_id. This is the single biggest new control-plane requirement and should gate Firecracker GA. A mismatch must fail to a #119 CRASHED, not a silent wrong-resume.
Golden snapshot (actortemplate_controller.go): the controller flow (create→boot→wait→checkpoint→store URI) is already backend-agnostic (calls ateapi RPCs, treats the snapshot URI as opaque). Only the downstream mechanism + artifact set differ — no controller change beyond the warmup heuristic becoming backend-aware.
6.4 atelet (cmd/atelet)
Backend-conditional fetch/prepare:fetchRunsc → one arm of a backend switch; a Firecracker arm fetches kernel + VMM binary (reusing the content-addressed download+sha256 pattern) and builds a rootfs block device (devmapper/ext4) from the image instead of untar-ing into a directory.
Snapshot artifact manifest: stop hardcoding checkpoint.img/pages*.img (main.go:341-394). The backend returns a SnapshotManifest; atelet uploads/downloads the listed artifacts opaquely via the existing ategcs.ObjectStorage + zstd helpers (reused unchanged).
Local-retention mode (PAUSED): add a "snapshot to local dir, don't upload, don't resetActorDirs" path so PAUSED keeps bytes on the node. Independent of backend; the atelet half of #119's PAUSED.
6.5 ateom (cmd/ateom-*)
Extract a Go interface (mirrors the existing runsc.go verbs):
ateom-gvisor implements it by refactoring runsc.go (no behavior change). ateom-firecracker is the new sibling binary.
7. Firecracker Backend (concrete)
Firecracker has a real, battle-tested snapshot API (PATCH /vm{Paused} → PUT /snapshot/create → PUT /snapshot/load). Sources in §14.
7.1 Run path (OCI → microVM)
Firecracker boots a guest kernel (vmlinux) + a root block device; it does not run OCI images and has no virtio-fs / host-guest FS sharing. Bridge via the firecracker-containerd + devmapper pattern: pull image → materialize ext4 thin device from layers → attach as RootDrive (virtio-block, no hot-plug, all drives pre-boot) → boot with a platform-supplied vmlinux → in-VM agent runs runc → ready. So substrate's "rebuild rootfs from image" becomes "re-materialize the devmapper ext4 device from the (pinned) image."
7.2 Networking
tap device + CNI (tc-redirect-tap chain: ptp veth + host-local IPAM + tap redirect). Guest connectivity is NOT preserved across restore. On resume, ateom-firecracker must recreate the tap + netns, reattach to the loaded VM, and trigger guest link re-detection (or hold IP stable via boot args + fixed MAC). The substrate "resume-on-inbound-traffic" trigger lives host-side (the tap/netns receives the packet and gates the lazy restore), which fits the existing atenet model — and the existing :authority→podIP:80 routing (extproc_in.go:143-149) keeps working if the guest shares the pod IP. vsock CID resets on restore; clock skews; MMDS data is not persisted.
7.3 Security / pod
Requires /dev/kvm (+ /dev/net/tun); seccomp on by default; use jailer (cgroups + chroot + pivot_root + drop privileges). In K8s, expose KVM via a device plugin so worker pods request devices.kubevirt.io/kvm instead of being blanket-privileged. Nested virt required if nodes are themselves VMs — confirmed working on bigbox (AMD EPYC, kvm_amd.nested=1; booted an Ubuntu microVM to a root shell, 2026-05-29).
PATCH /vm{Paused} → PUT /snapshot/create (Full; Diff is preview) → persist; tear down to free RAM/vCPU
vmstate file + memory file + rootfs disk (Firecracker won't capture the disk for you)
Restore
stage artifacts; recreate tap/netns; new FC proc PUT /snapshot/load w/ File(CoW) or UFFD(lazy) backend → Resumed; refresh NIC
consumes the three artifacts
7.5 PAUSED vs SUSPENDED + the NIC concern
PAUSED: keep {vmstate, memory, rootfs} on the node's local disk; resume prefers the same node (UFFD/CoW → single-to-tens-of-ms warm resume). Exactly what Firecracker's File/UFFD restore is designed for — a great fit.
SUSPENDED: upload {vmstate, memory, rootfs} to durable storage. This is where #119's NIC-saturation concern bites hardest: a Full memory snapshot is the entire guest RAM (a 4 GB agent ⇒ 4 GB upload), versus gVisor's small compressible deltas. Mitigations to design in: (a) Diff snapshots once GA (upload only dirtied pages); (b) balloon-inflate before snapshot to shrink the memory file; (c) compress the memory file in transit (reuse the zstd path); (d) keep a separable homedir layer (#119) so image upgrades don't force re-uploading memory+rootfs. Recommendation: durable SUSPENDED is implemented + proven (Phase 3), but in production prefer PAUSED (local) and gate durable behind a size-mitigation story for heavy-RAM actors.
7.6 Hard constraints (must surface to users + scheduler)
Restore portability is narrow: same FC version + same host kernel + compatible CPU (CPU templates; no Intel↔AMD). ⇒ Firecracker pools must be CPU-homogeneous, or pin a CPU template; the scheduler must enforce it. A mismatch → #119 CRASHED, never a silent wrong-resume.
Multi-restore is insecure without uniqueness handling (entropy/RNG/identity collide); VMGenID/VMClock mitigate. Matters for #119 "fork from snapshot" (create --from).
No GPU/PCI passthrough, no virtio-fs, no block hot-plug. GPU agents are out of scope for this backend.
UFFD handler is a SPOF during resume (a crash hangs the VM); ateom-firecracker must own/supervise it.
8. Integration with #119 (Actor State Machine)
This proposal supplies the backend hooks #119 needs:
snapshotConfig modes map to backend capabilities: None (both backends), homedir/process (both). Capability negotiation (§4) tells the control plane which are legal for a given pool.
PAUSED = checkpoint with Destination=LOCAL + atelet local-retention (§6.4) + same-node resume. Only for backends advertising supports_local_pause.
CRASHED on restore failure: a snapshot whose recorded provenance (SnapshotManifest) is incompatible with the target host (Firecracker CPU/kernel/version mismatch, or a corrupt/missing artifact) must transition to CRASHED rather than silently mis-resume.
"Devolution" generalizes to "snapshot provenance vs current runtime": Firecracker invalidates memory on kernel/VMM/CPU change; gVisor on runsc/image change. The SnapshotManifest.provenance map is where this is recorded (#119's review flagged that no provenance is recorded today — this proposal adds it).
Sequencing: complementary. The proto/CRD seams here are a prerequisite for #119's per-backend PAUSED/SUSPENDED semantics; #119's state machine is a prerequisite for exposing PAUSED to users. Land the seams (§5–6) first; zero behavior change.
9. Phasing
Status: all four phases below are implemented on branch firecracker-backend (plus the cluster e2e) — see the As-Built section near the top of this doc. The table is the original plan/rationale.
Phase
Scope
Exit criteria
Behavior change
0. Seams
Go Backend interface inside ateom-gvisor (refactor runsc.go); WorkerPool.Backend enum (default gvisor); backend-switch in buildDeploymentApplyConfig (gvisor=today); de-dupe runsc translation
gVisor works identically; unit tests
None
1. Proto generalization (#121)
RuntimeConfig oneof on all 6 requests (ateom field 7 / atelet field 9); populate responses (ready/ip/manifest); deprecate runsc_path/runsc; move snapshot filenames behind SnapshotManifest; add GetCapabilities + Destination
durable suspend/resume across compatible nodes; negative test for incompatible-host restore → CRASHED
opt-in
Smallest slice that proves pluggability with zero behavior change: Phase 0 alone.
10. Risks & Open Questions
Firecracker SUSPENDED data volume — full-RAM snapshots vs gVisor deltas. Open: is durable Firecracker suspend worth it before diff-snapshots are GA? (Recommendation: PAUSED-first.)
Scheduler homogeneity — Firecracker restore portability forces CPU/kernel-homogeneous pools or CPU templates. Open: encode host compatibility in the Worker record (it has no node/CPU fields today)?
Networking rewrite — the eth0→netns+AF_PACKET model is gVisor-specific; tap-based VMs need a different in-pod netns dance. Largest code change; lives in ateom-firecracker.
Sibling-binary vs unified binary — recommend sibling; revisit if image bloat/maintenance argues for one binary with build tags.
commit --remain (#119) conflicts with the gVisor checkpoint-resets-to-blank contract and with Firecracker (snapshot pauses the VM) — needs per-backend definition.
11. Testing & Validation
Phase 0/1: unit tests for the Backend interface + proto round-trip; assert the gVisor path is byte-identical (golden test on generated runsc commands).
Firecracker: e2e on a nested-virt node (bigbox qualifies — already boots microVMs). Matrix: Run→Checkpoint(LOCAL)→Restore same node; Checkpoint(DURABLE)→Restore on a second compatible node; restore on an incompatible CPU must → CRASHED (negative test). Measure snapshot size + resume latency (validate the PAUSED warm-resume claim).
Proto coupling + free field numbers: ateom.protorunsc_path=4 ×3 (:55,77,101), snapshot_uri_prefix=6, responses empty (:69,92,109), field 7 free in all three; atelet.protorunsc8/6/6 (:43,102,131), RunRequest field 6 free, field 9 free in all three, responses empty (:91,119,139).
No PAUSED/CRASHED/backend/snapshotConfig: pkg/proto/ateapipb/ateapi.proto:58-64; random scheduler workflow_resume.go:142-157.
Proto comment anticipating microVM: ateom.proto:21-22; roadmap docs/roadmap.md:14,71; architecture docs/architecture.md:68-72.
13. Considered & dropped
Kata Containers — dropped. Evaluated against Kata's docs + code: no usable upstream checkpoint/restore. Kata's Limitations.md states it "does not provide checkpoint and restore commands"; the Firecracker-VMM-backend save_vm is "Not implemented"; SaveVM/templating is fast-boot + shim-recovery, not resume-with-state (VM templating ≠ capturing a running app's memory). The only production precedent (Koyeb "Light Sleep") required a forked Kata shim over Cloud Hypervisor and hit virtio-fs/network sharp edges. Since substrate's value is the suspend/resume spine, Kata doesn't carry it. (If VM-grade isolation and durable suspend/resume are ever both hard-required, a Kata-over-Cloud-Hypervisor forked-shim spike is the path — but it's a separate project, not this proposal.)
CRIU + containerd — deferred. containerd+CRIU checkpoint is alpha/beta, forensic-positioned, single-node network restore, awkward image-rebuild restore, weak GPU. A separate ateom-criu (CPU-only, single-node) is a plausible later spike (Phase 3 optional), not now.
14. Appendix B — Sources (external)
Firecracker: snapshot-support, versioning, UFFD page-fault handling, CPU templates, getting-started, network-setup, jailer, seccomp (github.com/firecracker-microvm/firecracker/docs/); firecracker-containerd snapshotter/architecture/networking (github.com/firecracker-microvm/firecracker-containerd/docs/); KVM device plugin (kubernetes.io device-plugins; github.com/cgwalters/kvm-device-plugin); CodeSandbox memory decompression; Northflank FC-vs-CH (GPU/device limits). For the dropped-Kata rationale (§13): Kata Limitations.md; Kata VM-templating how-to; Cloud Hypervisor README/snapshot_restore.md ("not supported across versions"; VFIO out of scope); Koyeb "Light Sleep"; k8s forensic container checkpointing (alpha/beta); CRIU overview; NVIDIA cuda-checkpoint.
15. Methodology note
5 code agents (ateom-gvisor; atelet+storage; protos; control-plane/CRD/controllers; worker-pod/devices/networking) cited file:line; web research grounded Firecracker (and the Kata-drop rationale) against primary sources. Proto field numbers, the controller pod-builder, and the atelet manifest were re-read by hand before writing §5–6. Firecracker was booted on bigbox (nested KVM) as a feasibility proof. Baseline fe854f2.
Implementation Log — Firecracker ateom Backend (all phases, on bigbox)
Running journal. Newest entries appended at the bottom. Goal: implement the pluggable-backend phases from
~/notes/agent-substrate/2026-05-29-substrate-pluggable-ateom-backend-firecracker-proposal.md and land a working Firecracker
ateom backend in the substrate repo, proven on bigbox.
Setup / workflow
Repo (source of truth): local Mac /Users/dsrinivas/go/src/github.com/agent-substrate/substrate, branch firecracker-backend.
Build/run target:bigbox (Linux, nested KVM) at /root/substrate. My Edit/Write tools work on the local Mac fs, so the loop is: edit locally → rsync to bigbox → build/test/run on bigbox → rsync generated files back.
rsync MUST include .git — hack/run-tool.sh does git rev-parse --show-toplevel; without .git, all tooling (proto codegen, setup-envtest) fails with exit 128.
After rsync, on bigbox: chown -R root:root /root/substrate + git config --global --add safe.directory /root/substrate (rsync preserves Mac uid 501 → git "dubious ownership").
Prior PoC (already proven, separate from repo) — 2026-05-29 AM
Standalone ateom-firecracker Go program in /root/fc-demo drove the demos/counter workload through
Run→Checkpoint→Restore on a Firecracker microVM: in-RAM counter continued (4→6, not reset) and /random-content-file
fshash identical. Runbook: ~/notes/agent-substrate/2026-05-29-firecracker-ateom-poc-bigbox.md; code: ~/notes/agent-substrate/firecracker-poc/ateom-firecracker.go.
This validated the runtime mechanics; the repo work below turns it into a real, integrated backend.
T1 — Baseline green on bigbox ✅
rsync repo → bigbox (with .git). go build ./... → green (compiles linux-only ateom-gvisor too).
go test ./...: initially 2 failures (internal/controllers, cmd/ateapi/internal/controlapi) — both because their TestMain shells out to setup-envtest (kube-apiserver test binaries) and that failed (git/ownership issue above). Redis is in-process (miniredis), not a problem.
After fixing .git/ownership: setup-envtest use downloaded k8s 1.36.0 binaries to /root/.local/share/kubebuilder-envtest; both tests pass (controlapi 12.7s, controllers 8.7s). Full suite green.
The real cross-backend contract is the proto (Ateom gRPC service), not a shared Go type — each ateom binary
implements the service directly. So the per-binary Go Backend interface lives inside ateom-firecracker (Phase 2;
already in the PoC). Phase 0 = the load-bearing declarative seams.
Current WorkerPoolSpec = {Replicas, AteomImage} (pkg/api/v1alpha1/workerpool_types.go:21-30). Worker pod is a
privileged Deployment (one ateom container, hostPath /run/ateom-gvisor), built in
internal/controllers/workerpool_controller.go:121-173. A privileged container already exposes host /dev/kvm +
/dev/net/tun, so the firecracker pod shape is close to gVisor's; the meaningful add is resource reservation.
Runsc→RunscConfig translation is duplicated in workflow_resume.go:193-210 and workflow_suspend.go:119-136 — extract a helper (this is where Phase 1's RuntimeConfig will plug in).
Edits planned: add Backend string enum field (default gvisor) to WorkerPoolSpec; switch in buildDeploymentApplyConfig on backend (firecracker → add resource requests, keep privileged+hostPath); extract buildRunscConfig. Then regen CRD on bigbox, build+test.
T2 — Phase 0 ✅ DONE
pkg/api/v1alpha1/workerpool_types.go: added Backend enum field (+kubebuilder:validation:Enum=gvisor;firecracker, +kubebuilder:default=gvisor) + BackendGVisor/BackendFirecracker consts.
internal/controllers/workerpool_controller.go: buildDeploymentApplyConfig now extracts the container and, for backend==firecracker, adds resources.requests (1 CPU / 1Gi) — /dev/kvm+/dev/net/tun already reachable via the existing privileged securityContext, so no extra device plumbing for the PoC. gVisor path output unchanged.
Deferred the buildRunscConfig dedupe to Phase 1 (that block gets rewritten for RuntimeConfig anyway).
Regen: controller-gen added backend to ate.dev_workerpools.yaml (default gvisor + enum). deepcopy unchanged (string field). go build ./... green; pkg/api + internal/controllers tests pass (7.8s — exercises the Deployment builder, confirms gVisor shape intact). Pulled generated CRD back to local.
Decision: the per-binary Go Backend interface lives in ateom-firecracker (Phase 2); the cross-backend contract is the Ateom proto. Skipped a risky internal rewrite of ateom-gvisor.
T3 — Phase 1 (plan): proto generalization
Plan: do internal/proto/ateompb/ateom.proto fully now (it's what ateom-firecracker needs to be a drop-in Ateom server); defer atelet.proto to Phase 2 (when atelet is actually wired). Additive changes: RuntimeConfig oneof (gvisor|microvm) + GVisorParams/MicroVMParams, runtime=7 on all 3 requests, deprecate runsc_path, Destination enum + destination=8 on checkpoint, populate responses (ready/ip on Run/Restore, SnapshotManifest on Checkpoint), add Capabilities + GetCapabilities RPC. Regen via hack/protoc.sh, gVisor dual-reads runtime.gvisor.runsc_path else legacy runsc_path. Build+test green = zero behavior change.
T3 — Phase 1 ✅ DONE
Rewrote internal/proto/ateompb/ateom.proto: RuntimeConfig{oneof gvisor|microvm}, GVisorParams{runsc_path}, MicroVMParams{vmm/kernel/rootfs paths, vcpu/mem, cpu_template, tap/guest net}, runtime=7 on all 3 requests, runsc_path marked [deprecated=true], Destination{DURABLE,LOCAL} enum + destination=8 on checkpoint, responses populated (ready/workload_ip on Run/Restore, SnapshotManifest manifest on Checkpoint), SnapshotManifest, Capabilities, GetCapabilitiesRequest + rpc GetCapabilities.
Proto regen toolchain on bigbox: go generate ./... in the pkg dir runs hack/protoc.sh (downloads pinned protoc 25.3) + protoc-gen-go/-grpc via run-tool.sh. Needed unzip (installed). Generated ateom.pb.go + ateom_grpc.pb.go.
cmd/ateom-gvisor/main.go: added real GetCapabilities (local_pause+mem_snapshot true, restore_requires_same_host false) + gvisorRunscPath() dual-read helper; switched the 3 req.GetRunscPath() reads to it.
go build ./... green; controlapi (11.3s) + ateom-gvisor/internal/ateom tests pass. gVisor build green = UnimplementedAteomServer embed makes GetCapabilities additive. Pulled regenerated .pb.go back to local (verified 4 new symbols).
Deferred atelet.proto to Phase 2 (wire it when atelet actually forwards microvm config). Gotcha: after regen-on-bigbox, must scp the .pb.go back to local BEFORE the next rsync local→bigbox, or the stale local copies clobber the regen.
T4 — Phase 2 (plan): cmd/ateom-firecracker gRPC Ateom server
Plan: new linux-only binary implementing ateompb.AteomServer (Run/Checkpoint/Restore/GetCapabilities) by driving Firecracker, reading MicroVMParams from the request's runtime. Ports the proven PoC backend. Boots rootfs_image_path+kernel_image_path with vmm_binary_path, tap networking from the request, snapshot to a local per-actor dir (LOCAL=PAUSED kept local; DURABLE→Phase 3). Serialized by a mutex (one workload at a time, like ateom-gvisor). Listens on a unix socket (-socket flag, or derived from pod ns/name). The OCI→rootfs build (devmapper) is atelet's job — for the PoC the rootfs is pre-staged and its path passed in MicroVMParams.
T4 + T5 — Phase 2 core + proof ✅ DONE
cmd/ateom-firecracker/main.go (//go:build linux) + main_unsupported.go: a real gRPC Ateom server (fcService) implementing Run/Checkpoint/Restore/GetCapabilities by driving the Firecracker HTTP API. Reads MicroVMParams from req.runtime.microvm; tap setup, boot (boot-source/drives/machine-config/net/InstanceStart), pause+snapshot/create (Full) to a per-actor local snap/, kill VM to reset, restore via snapshot/load+resume. Returns SnapshotManifest from checkpoint. Destination=LOCAL→keep local (PAUSED); DURABLE→Phase 3. Listens on -socket (or pod-derived ateompath.AteomSocketPath). go build ./... + go vet green; 17M binary.
cmd/ateom-firecracker/integration_test.go (//go:build linux, gated by ATEOM_FC_E2E=1): starts fcService as a gRPC server on a unix socket, drives it via the generated ateompb client through GetCapabilities→Run→curl×3→Checkpoint(LOCAL)→verify-unreachable→Restore→curl, asserting count continuity.
PASS on bigbox (8.42s): count continued 4 → 6 across checkpoint/restore via the gRPC Ateom contract; manifest = {vmstate,memory; backend=firecracker; vmm=Firecracker v1.15.1}. Phases 0-2 are real, in-repo, and proven.
Recurring non-issue: local gopls flags undefined: ateompb.* on the new files because it hasn't re-read the scp-replaced .pb.go; bigbox compiles+runs fine (source of truth).
Scope: moved the remaining atelet wiring (atelet.proto RuntimeConfig, atelet backend switch, OCI→ext4 rootfs builder, manifest-driven upload) + ActorTemplate microvm CRD into the e2e task (#7), since they're gated on the OCI→rootfs builder and the cluster.
T6 — Phase 3 (plan): durable SUSPENDED via ategcs
Plan: on Destination=DURABLE, upload {vmstate, memory, rootfs} to snapshot_uri_prefix via internal/ategcs (zstd); on restore, download then load. Prove a durable round-trip + cross-"node" restore against a local S3 (minio) on bigbox. Note: in production atelet owns upload (using the SnapshotManifest); putting it in ateom-firecracker here is a PoC shortcut.
T6 — Phase 3 ✅ DONE
cmd/ateom-firecracker/main.go: Checkpoint(DURABLE) uploads {vmstate, memory, rootfs} to snapshot_uri_prefix via ategcs.SendLocalFileToGCSWithZstd; Restore calls fetchDurableSnapshot to pull them when the local snapshot is absent. newObjectStorage(ctx) mirrors atelet (env ATE_STORAGE_BACKEND=s3 → AWS SDK w/ UsePathStyle, honoring AWS_ENDPOINT_URL/creds; else GCS).
cmd/ateom-firecracker/integration_test.go: added TestFirecrackerAteomDurable — checkpoint DURABLE on "node A", then a fresh fcService (different workdir = node B with no local snapshot) restores by pulling from object storage.
Set up minio on bigbox (/root/minio + /root/mc, bucket ate-snapshots). PASS (9.1s): count continued 4 → 6 across a DURABLE checkpoint + restore on a fresh node; objects in bucket: memory 13MiB (zstd of 256MB RAM), rootfs 5.7MiB (zstd of 512M sparse ext4), vmstate 1.9KiB. LOCAL test still green (no regression). gofmt/go vet/go build ./... clean.
STATUS: all code phases (0–3) DONE + proven on bigbox
cmd/ateom-firecracker real gRPC Ateom server driving Firecracker
TestFirecrackerAteomGRPC PASS via generated client (count 4→6)
3
Durable SUSPENDED upload/download via ategcs + S3
TestFirecrackerAteomDurable PASS (fresh-node restore from minio, count 4→6)
Files changed/added (branch firecracker-backend):pkg/api/v1alpha1/workerpool_types.go, internal/controllers/workerpool_controller.go, manifests/ate-install/generated/ate.dev_workerpools.yaml, internal/proto/ateompb/ateom.proto (+ regenerated *.pb.go), cmd/ateom-gvisor/main.go, and new cmd/ateom-firecracker/{main.go,main_unsupported.go,integration_test.go}.
Run the proofs on bigbox:cd /root/substrate && ATEOM_FC_E2E=1 go test ./cmd/ateom-firecracker/ -v -count=1 (LOCAL); add ATEOM_FC_DURABLE_URI=s3://ate-snapshots/actors/counter-durable ATE_STORAGE_BACKEND=s3 AWS_ENDPOINT_URL=http://127.0.0.1:9000 AWS_REGION=us-east-1 AWS_ACCESS_KEY_ID=minioadmin AWS_SECRET_ACCESS_KEY=minioadmin for DURABLE (needs minio running).
T7 — full kind cluster e2e — checkpoint [SUPERSEDED — see "T7 — e2e ✅ DONE" below]
This was my status checkpoint at the moment I paused to ask whether to attempt the cluster e2e. After the "keep going" go-ahead I did complete it — and most of the "hard pieces" listed below were sidestepped by the zero-touch cluster mode (cmd/ateom-firecracker/cluster.go): no atelet / ate-api-server / CRD / proto changes, reusing the existing kind cluster (whose node already exposes /dev/kvm). Kept here for the journal narrative; the authoritative result is the "T7 — e2e ✅ DONE" entry below (that's how the demo actually ran). At the checkpoint it looked like this would require:
atelet wiring — atelet.protoRuntimeConfig (field 9) + atelet backend switch that, for microvm, fetches kernel/vmm and builds an ext4 rootfs from the OCI image (the heavy bit — firecracker-containerd/devmapper or a hand-rolled image→ext4), and uploads via the returned SnapshotManifest.
KVM-in-kind — pass /dev/kvm (+/dev/net/tun) into the kind node container and the ateom pod (nested KVM: bigbox L1 → kind-node container → fc L2).
Networking — create the tap inside the worker pod's netns and reconcile the guest IP with atenet's :authority→podIP:80 routing (the deep-dive's leak point).
Each is a real chunk; the rootfs builder alone is a subsystem. The gRPC-level proofs (T5/T6) already demonstrate the backend works end-to-end through the real Ateom contract incl. durable storage, so the cluster e2e is "additional confidence," not a missing capability.
T7 — e2e attempt (plan): reusing the existing cluster
Discovery (2026-05-29 PM): bigbox already has a healthy 4-day-old substrate kind cluster (kind-control-plane, k8s v1.35) running the user's helpdesk/OpenShell demo (ate-demo-helpdesk workerpool+template, ate-openshell-m0 ns). Full control plane up: ate-api-server, ate-controller, atelet, atenet-router, dns, rustfs, valkey. CRDs actortemplates/workerpools installed.
Will NOT destroy it. Reuse + add a Firecracker pool/demo with strictly additive changes; verify helpdesk stays healthy.
The node already exposes /dev/kvm + /dev/net/tun (privileged kind node) → Firecracker pods can run here without recreating the cluster. Big unblock (no KVM-in-kind cluster surgery needed).
Host kubectl returns HTML (invalid character '<') — just an unset/wrong host kubeconfig; in-cluster (docker exec kind-control-plane kubectl) is healthy. Will drive via in-container exec to avoid a docker restart (the §10 shorewall fix) that would bounce their cluster.
Newly-scoped reality for the cluster e2e (beyond Phases 0-3):
atelet/ateom images are distroless (ko) → no mkfs.ext4/busybox in them. So the OCI→ext4 rootfs builder must live in a custom ateom-firecracker image (ubuntu base + firecracker + vmlinux + e2fsprogs + busybox + iptables + the Go binary), which turns the image atelet extracts into an ext4 rootfs + guest init.
Control-plane wiring needed (additive): ActorTemplate backend/runtime field (CRD), ate-api-server passing a backend hint, atelet branching to the firecracker path, atelet.proto field.
Networking: atenet routes to the worker pod IP:80, but the guest is on a tap at 172.16.0.2 → ateom-firecracker must DNAT pod-IP:80 → guest:80 in the pod netns.
Plan order: build custom ateom-firecracker image w/ in-image rootfs builder → wire ateapi+atelet (additive) + ActorTemplate CRD → load images + apply CRD + rollout (verify helpdesk) → create firecracker WorkerPool + counter template → drive via atenet, prove suspend/resume.
T7 — e2e ✅ DONE (counter on a Firecracker worker through the real control plane)
Zero-touch breakthrough: NO changes to ate-api-server, atelet, the proto, or the CRD. cmd/ateom-firecracker/cluster.go adds a "cluster mode" (used when the unmodified atelet passes no MicroVMParams): it derives the rootfs + entrypoint from the shared hostPath atelet already populates (bundles/<c>/rootfs + config.json), builds an ext4 (busybox + /init that nets up + execs the entrypoint), boots the microVM with the baked-in kernel/firecracker, DNATs pod-IP:80→guest:80 so atenet routing reaches the guest, and maps its snapshot onto the files atelet ships: checkpoint.img = tar{vmstate, rootfs.ext4}, pages.img = memory, pages_meta.img = placeholder. So the existing atelet uploads/downloads them through rustfs unchanged.
Custom image localhost:5001/ateom-firecracker:dev (ubuntu + firecracker + vmlinux + busybox + e2fsprogs + iptables + the static Go binary). Counter image via ko build → localhost:5001/counter@sha256:….
Reused the user's existing kind cluster (node already exposes /dev/kvm). Created ns ate-fc-counter + WorkerPool(ateomImage=ateom-firecracker:dev) + counter ActorTemplate. Golden-snapshot flow → Ready (microVM built from the ko image, booted, checkpointed — all via the unmodified ateapi/atelet/controller).
kubectl ate create actor fc-1; drove via atenet. Gotcha: atenet-router svc port 80 → envoy targetPort 8080, so curl the svc ClusterIP:80 (not the router pod:80). Resume-from-golden → count 1,2,3 → kubectl ate suspend actor fc-1 → resume via atenet → count CONTINUED 4,5. In-RAM state preserved across suspend/resume on a Firecracker microVM, driven entirely by ate-api-server + atenet.
ALL TASKS COMPLETE (Phases 0–3 + cluster e2e), proven on bigbox.
Delivery (2026-05-29)
Committed as bc533f5 "feat(ateom): pluggable Firecracker microVM backend" — GPG-signed (key 6DEA…6885, "Good signature, Davanum Srinivas"), no Co-Authored-By / no AI attribution. 11 files, +2180/-132.
Pushed to the dims fork (origin = git@github.com:dims/substrate.git) as branch firecracker-backend. (No PR opened — not requested.)
Moved to a worktree: main …/substrate dir returned to main; branch now lives at …/agent-substrate/substrate-firecracker (matches the repo's per-branch worktree convention).
Firecracker markdowns updated to reflect implemented/proven/pushed status and moved into ~/notes/agent-substrate/ (this log, the proposal, the PoC runbook, the firecracker-poc/ code, and the #119 review); cross-references rewritten. The general docs (components, community-health) only reference Firecracker in cross-system comparisons → left unchanged.
bigbox state left as-is: fc-1 actor RUNNING in ns ate-fc-counter; image localhost:5001/ateom-firecracker:dev; helpdesk/gVisor demo healthy. Cleanup (if/when desired): kubectl delete ns ate-fc-counter.