This is super nuanced and depends on what your agent actually does and what “sandbox” needs to mean for your product.
Before picking a tech, I think you need clarity on stuff like:
-
Scale: Are we talking “single box POC” or “global, lots of concurrent agents”?
-
Deployment: On‑prem install vs cloud SaaS?
-
UX / product model: Is it a backend worker (service bus / event-driven), or an interactive chat/web UI where users expect continuity?
-
Routing: Single server? Multi-server? Do you need sticky routing per user/session?
-
Failure tolerance: What’s the worst case if a node dies?
- “User gets an error and retries” vs
- “We just leaked data / ran untrusted code in a bad way”
-
Security bar / SLA expectations: Are you offering SLAs? Is this a critical workflow?
-
Latency: What does “slow” mean for you? Milliseconds matter, or your agent runs for minutes anyway?
For certain workflows, filesystem sandboxing is honestly enough.
Example from my setup:
-
I have an agent that interacts with code across multiple projects/branches.
-
For each agent run, I create a temporary workspace (random ID).
-
Inside it:
- I keep some baseline projects around (dynamic/Git inputs).
- I use git worktrees to clone/create working copies into that workspace.
- Sometimes symlinks, depending on what’s needed.
-
When the agent finishes, I delete the workspace.
This works really well when:
- You’re on one machine,
- The agent is mostly doing read-only investigation or bounded edits,
- And you don’t need strong isolation beyond “don’t touch anything outside this workspace”.
But it assumes: single host and a pretty straightforward architecture.
In a POC stage, I’d honestly recommend: one server if you can get away with it (scale vertically, keep it simple).
If you do need multiple servers:
- Option A: a smart router that tries to always send the same user/session to the same node (sticky routing).
- Worst case: node dies → current sessions error out. Not ideal, but maybe acceptable.
And again: the key is deciding what your “acceptable failure” looks like.
Next level is using Linux isolation primitives properly — filesystem + networking isolation, etc.
One example: Bubblewrap (old but still solid). There are newer tools too, and Linux obviously has the building blocks built-in.
The big upside here:
- Local sandboxes are usually easier to maintain
- Easier to scale horizontally if you’re routing users to nodes anyway
Then there’s the “standard” container approach.
The tradeoff is mostly startup + overhead:
-
Docker can be slow to start (depending on your stack)
-
But: what matters is your workload
- If your agents run for minutes, a few seconds to boot a container is often not a big deal
- If you need snappy, interactive, per-request sandboxes, Docker might be the wrong tool
If you want stronger isolation than containers and fast startup, microVMs come up quickly.
-
Firecracker is the obvious example.
-
Reported boot times are in the “tens of milliseconds” range (depending on setup).
-
I haven’t gone deep enough yet on practical questions like:
- Image/model format constraints
- How “Docker-like” the workflow can be
- How painful it is operationally vs containers
But this seems like a legit path if you care about both security isolation and fast provisioning.
Then there are the hosted “agent sandbox” providers.
Examples people mention:
- Vercel sandbox-ish options (depending on what you mean by sandbox)
- E2B (seems widely used; I’ve seen it referenced as a common choice in agent stacks)
- Other “run code with I/O” type sandboxes
Tradeoffs here are obvious:
- You offload ops
- But you inherit privacy/compliance considerations (depending on data + customer expectations)
This feels like the fork in the road.
Simple model:
- Execute task
- Tear everything down
- Done
Kubernetes/job-style patterns fit nicely here:
- Run a pod/job
- Kill it on completion
- Move on
This gets tricky fast.
If you want “pause and continue later”, you need to solve both:
-
Data persistence
- For local filesystem: I can just keep the workspace folders (data is cheap).
- For VMs: snapshotting / filesystem snapshots / suspend-resume patterns.
-
Agent state persistence
- Having the files isn’t enough.
- Your agent needs a resumable state machine (or equivalent).
- In my case, I built strong state machine support early so I can save/resume cleanly.
Also: keeping “a million open sandboxes” is expensive, so suspend/resume becomes a cost control mechanism — but it raises implementation complexity.