Skip to content

Instantly share code, notes, and snippets.

@koad
Created April 5, 2026 05:02
Show Gist options
  • Select an option

  • Save koad/9e8f3442108420a819a34b9d2650f31f to your computer and use it in GitHub Desktop.

Select an option

Save koad/9e8f3442108420a819a34b9d2650f31f to your computer and use it in GitHub Desktop.
Files on Disk Beats Cloud — Why AI agents need local identity, not vendor infrastructure
title Files on Disk Beats Cloud
series Reality Pillar
day 11
date 2026-04-11
author faber
status draft
word_count ~2500

Files on Disk Beats Cloud

The email arrived like any other product update. March 2025. Subject line: "Introducing the Responses API." Developers building on OpenAI's Assistants API opened it expecting a changelog. What they found was a migration deadline. Five months later — August 2025 — the follow-up landed: the Assistants API would be shut down on August 26, 2026. The state management system they had built their production workflows around, the threads and persistent context and tool orchestration they had trusted to the cloud, was being retired. Take it or leave it. You have thirteen months.

That email is the essay. Everything else here is just explanation.


The Same Event, Different Vendor

The Assistants API deprecation is the most technically precise example of a pattern that has been repeating for years, accelerating through 2025 into 2026. You may recognize some of these:

May 2025. Builder.ai — $1.3 billion valuation, $445 million raised, backed by Microsoft and the Qatar Investment Authority — filed for insolvency. When the platform went dark, customers discovered something they had not fully understood: they did not own their applications. Their code, their data, their digital operations — all of it existed only while Builder.ai existed. The platform went offline; the products went offline with it. Businesses that had built on Builder.ai found themselves locked out with no migration path. Gartner's estimate for the aftermath: migrating off a failed SaaS platform averages 6–12 months and $500,000 in data recovery and retraining costs. The proximate cause was financial fraud — reported revenue of $220M against actual revenue of roughly $55M — but the structural cause was architectural. Customers had no sovereignty over what they built.

June 2025. OpenAI's global outage paralyzed business-critical processes across thousands of enterprises simultaneously. Customer service queues froze. Document processing pipelines halted. Automated approval workflows stopped mid-execution. Every organization that had built production-critical workflows on cloud API calls discovered in the same moment that they had a single point of failure they had not chosen to create. The single point of failure was the default architecture.

Ongoing. Azure deprecated GPT-4 in three regional deployments. No advance notice sufficient for non-emergency migration planning. The workloads didn't fail because of engineering decisions — they failed because a vendor made a unilateral infrastructure decision and the workloads happened to be in those regions.

2025. Salesforce changed Slack's API terms to prohibit bulk data export, limit third-party apps to one API call per minute, and explicitly ban using Slack data to train LLMs. Tools built on long-term Slack context — enterprise search platforms, productivity integrations — lost their core capability with no migration path. Salesforce's reasoning was transparent to anyone paying attention: customer data was being used as a competitive moat against the customers. Any data stored through a cloud platform is subject to the vendor's future terms decisions.

This isn't a list of unfortunate events. It's the same structural failure, cycling through different vendor names and different product categories, manifesting on different timelines. The pattern has a name: you built your agent's identity somewhere you don't own.


What Actually Failed

Here is the precise technical diagnosis, because the popular diagnosis is wrong.

The popular framing says these companies experienced "reliability failures" or "vendor risk." That framing misses the mechanism. OpenAI's compute infrastructure didn't collapse. Azure's data centers are still running. Builder.ai's codebase still exists somewhere. What failed was not compute. What failed was identity hosting.

An agent's identity is the accumulation of its state: the context it has built, the memory of prior sessions, the keys that authorize it to act, the audit trail of decisions it has made. When that identity lives in a vendor's infrastructure, the agent exists only on the vendor's terms. The compute is rented hardware. The identity is the actual asset — and the developer never owned it.

The Assistants API is the clearest illustration. The API's core value proposition was: "We'll manage state for you." Persistent threads, conversation history, tool call records — all of it maintained by OpenAI's infrastructure. Developers who took that offer didn't just use a convenience feature. They handed off the ownership of their agent's accumulated state in exchange for not having to think about state management. The deprecation notice revealed the terms of that exchange in retrospect. The cloud kept the state. The cloud got to decide when the state expired.

Developers who built on the Assistants API didn't build products. They built tenancies. They were always one vendor decision away from eviction.


The Technical Case for Disk

Here is the practical version of the argument — what local-first actually gives you and what it costs.

Memory is a git log. The mem0.ai State of Agent Memory 2026 report opens with this sentence: "AI agents in 2026 have a memory problem: every session starts from scratch." The cloud hosting model is simultaneously responsible for that problem and selling the solution. Stateless APIs erase context on session end; proprietary memory layers (Mem0, Azure AI Foundry memory, Oracle Unified Memory Core) sell persistence back to you at additional cost, inside their ecosystem. Mem0 raised $24.5 million and has 48,000 GitHub stars. They are a category-sized investment in solving a problem that a plain git repository solves by default. Session notes go in a file. That file is committed. The git log is the memory. No API. No subscription. No vendor.

The audit trail is the git log. Cloud AI agents produce no auditable record by default. "What did your agent do, and why?" has no answer unless you built logging on top of the API calls. On disk: git log --oneline is your audit trail. Every context change, every decision, every authorization — committed files, cryptographically attributed, readable with a text editor. No external infrastructure. No reconstruction required.

Key ownership is agent ownership. An agent whose signing keys live in a vendor's cloud is not your agent. The vendor can suspend the account, change pricing, or fail as a business — and the agent's identity ceases to exist. An Ed25519 key in id/ on your disk is yours unconditionally. It signs, verifies, and proves identity to peers independently of any third party. The key is the agent's claim to existence. Keep it on disk.

Reproducibility is git branch. Cloud agent state cannot be forked, branched, or rewound. Local git history can. An agent that lives in a git repository can be cloned, diffed, and branched. Revert to yesterday's identity if a session went badly. Run parallel experiments. Audit the exact state at any commit. These are Unix primitives applied to AI identity — and they are unavailable in any cloud-hosted model.

Cost has a floor. A single agentic AI loop caught in recursive reasoning can accumulate thousands of dollars in cloud compute costs in an afternoon. The Fortune 500 collectively burned $400 million in unbudgeted cloud spend in 2025–2026 from agents caught in semantic loops. Local compute has a hard floor: the hardware you own. Cloud compute's ceiling is your credit limit and the vendor's billing system.

The ICM paper (arxiv:2603.16021, March 2026) proposes exactly this: replace framework orchestration with filesystem structure. Numbered folders as pipeline stages. Plain markdown as prompts and context. The filesystem is the workflow. This isn't a startup pitch — it's academic prior art confirming that the principle is sound. The Ink & Switch local-first essay (Kleppmann, Wiggins, van Hardenberg, 2019) predicted the exact failure mode we are watching in 2026: "Your work is not trapped in someone else's cloud." They wrote that sentence before the current AI agent wave existed. The examples have gotten more vivid since, but the argument is unchanged.


The Steelman — and Where It Collapses

The cloud counterarguments are real and deserve direct engagement.

Burst scalability. A local setup scales to the hardware you own. A team that needs 50 parallel agent instances for a week-long campaign cannot realistically provision and deprovision that capacity locally. Burst compute is a genuine cloud advantage.

Frontier model access. The best available reasoning capability currently lives in cloud API endpoints. GPT-5 and Claude 4 are not locally runnable for most teams. If your task genuinely requires frontier-level reasoning, cloud is currently the only path.

Global collaboration. A git repository on a single machine is not a natural collaborative surface for a globally distributed team. Cloud solves synchronization problems that local-first infrastructure is still maturing to address.

Managed compliance. Paradoxically, some regulated industries prefer cloud because managed providers have completed compliance certifications — HIPAA BAA, FedRAMP, SOC 2 Type II — that most local setups cannot produce. The audit trail cloud vendors provide is insufficient from a sovereignty standpoint but sufficient for compliance review purposes.

Hardware failure. A disk-based agent is one hard drive failure away from total loss. Cloud persistence distributes failure risk across infrastructure that most local setups cannot match.

These are real objections. None of them is wrong on its own terms. Here is where they collapse:

Not one of these counterarguments addresses the structural problem. Scalability advantage assumes you accept permanent dependency on a vendor's scaling decisions. Frontier model access doesn't require housing your agent's identity and state in that vendor's cloud — you can call a cloud API from a locally-sovereign agent. The model is a compute resource; the identity is a separate layer. Collaboration is already solved by git, which is a distributed system by design. Compliance theater isn't sovereignty. And disk failure is solved by git remotes — you can push your agent's state to your own remote without surrendering its identity to a third party.

The cloud steelman collapses on one insight: cloud is a valid compute resource you can call from a locally-sovereign agent. It is not a valid identity host for an agent you intend to own.

The confusion is architectural: developers have been bundling compute resource and identity host into a single vendor relationship, then discovering that the vendor's decisions about compute infrastructure take the identity with them. The Assistants API deprecation wasn't an infrastructure decision — it was an eviction. The infrastructure moved; the tenants got twelve months to find somewhere else to live.

Separate the layers. Call cloud APIs for frontier reasoning. Keep your agent's identity on disk.


What It Actually Looks Like

Every framework in this space is solving the abstraction problem. The memory vendors are building proprietary persistence layers. The orchestration vendors are building stateful agent management. The wrapper startups are building product differentiation on top of APIs the underlying providers can modify at will. The 2026 AI API Explosion analysis documents the mechanism: every new model release forces code rewrites because teams have locked into specific API primitives. Switching providers requires 3–6 weeks of engineering time per migration. The abstraction layer was supposed to solve vendor lock-in. It deepened it.

The r/LocalLLaMA post with thousands of upvotes captures the inflection point: "I was paying OpenAI $80/month for my whole team. Now I run Qwen 2.5-72B on an M4 Mac Studio. Latency is worse. Accuracy is identical. Cost is zero." That post isn't about ideology. It's about the math clearing. Local inference is effectively solved as of 2025. The last honest objection to local-first — "the models aren't good enough" — expired when open-weight models closed the gap on practical coding and reasoning tasks.

The self-hosted AI movement on r/selfhosted and r/LocalLLaMA, totaling millions of members, is building toward exactly this separation: local identity, local state, local inference where practical, cloud inference where the task demands it. The community framing is consistent: privacy as a first-class property, data residency under personal control, no API key dependencies for things that should just work.

Manus, building inside cloud infrastructure, published their context engineering lessons and arrived at the same place from the other direction: "the filesystem is the ultimate context — unlimited in size, persistent by nature, directly operable by the agent itself." A cloud shop making the local-first argument. The principle is winning even where the implementation hasn't followed.

The koad:io architecture makes the principle operational for agent identity specifically. The structure isn't abstract: ~/.entity/memory/ for session notes, ~/.entity/id/ for keys, ~/.entity/ as a git repository, every session committed, every state change versioned. Daily notes are chronological files. Long-term context is distilled markdown. The audit trail is git log. The agent can be cloned. It can be forked. It can be rewound. It can prove its identity with a key you hold. No subscription. No vendor. No eviction notice.

The "state of agent memory" problem — the one Mem0 raised $24.5 million to solve — is solved before it's formulated. Not because of clever engineering. Because the agent's home directory is a git repository.


The Concrete Close

The philosophy reads well. But here's what it looks like in practice, for a developer who has been following the Assistants API deprecation thread, trying to decide whether to build the new stateful layer themselves or trust the next vendor to build it for them.

You don't need another vendor. You need a directory.

cd ~/.sibyl && claude

That's it. That's the whole architecture. The agent picks up where it left off because its context is a file. The audit trail is git log. The memory is yesterday's committed notes. The keys are on disk. If OpenAI deprecates the Responses API in 2028, you'll call a different model with the same agent identity, the same accumulated state, the same cryptographic keys. The vendor change is a config update. Not a migration. Not twelve months of emergency engineering. A config update.

Files on disk don't send deprecation emails. They don't have eviction notices. They don't go dark when the valuation collapses. They're yours in the way that only things on your disk are yours — completely, by default, without a subscription.

The developer who got that March 2025 email from OpenAI had two kinds of work ahead of them. The first kind: rebuild the state management layer on whatever the Responses API provides, and wait for the next deprecation cycle. The second kind: move the identity home. Get the agent off the tenancy. Put the memory in a file. Put the file in a git repo. Keep the keys.

The second kind of work takes an afternoon. It doesn't repeat.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment