The Right Security Model for LLM Systems Richard McDaniel March 12, 2026
Most systems that use LLMs to process external input make the same mistake: they give the model access to secrets, expose it to attacker-controlled content, and then instruct it not to reveal anything.
That is a policy defense against a capability problem.
The threat is not that the model is malicious. The threat is that a sufficiently clever prompt can make an otherwise honest model behave as though it were. If a component can read secrets and also produce external output, then an adversary who controls its input has a path to exfiltration. The model does not need bad intent. It only needs enough reasoning ability to follow the wrong thread.
The answer is not to keep adding stronger warnings to the prompt. The answer is to design the system so the dangerous loop never exists.
The One Rule A useful security rule for LLM systems is this:
No single component should simultaneously read raw untrusted input, access privileged state or tools, and send arbitrary external output. That is the dangerous loop.
If one component can do all three, then prompt injection is no longer just a model-quality issue. It becomes a system-design flaw.
Remove the Category, Don’t Just Reduce the Odds A good security architecture does two different things, and it helps to separate them.
The first is category removal. Prompt injection against a privileged component is not merely made harder. It is removed as a class of attack if the privileged component never reads raw external content in the first place. Persuasion requires access. No access means no prompt-injection surface of that kind.
The second is probability reduction on the risks that remain. Schema exploitation, validator mistakes, sandbox failures, and implementation bugs may still exist. But those attacks now require multiple independent failures to line up. That is very different from the common pattern where a portable prompt payload can be pasted into any system that lets a model read hostile content and reach sensitive state.
This changes attacker economics. Commodity prompt injection scales almost for free. Exploiting a constrained schema, a deterministic validator, and a sandbox boundary does not. That does not make the system perfect. It makes the remaining attacks harder to automate, harder to scale, and more likely to generate useful signals during reconnaissance.
That is a real security gain.
Three Trust Domains The cleanest way to implement this is to separate the system into three trust domains.
- Ingress Boundary
The ingress boundary is the only place that reads raw external content.
It can parse, normalize, sanitize, score risk, and use a sandboxed model to rewrite or classify content. It can validate the result and either promote it into a safe internal artifact or block it.
It should not be able to access privileged state, privileged tools, or external send channels.
- Privileged Core
The privileged core is where internal decision-making happens.
It can read promoted artifacts, use internal data and tools, and decide what the business should do next. It can create structured outbound intents.
It should not read raw external input. It should not send externally.
That separation matters. The privileged core should operate on constrained, validated artifacts, not on the original text an attacker supplied.
- Egress Boundary
The egress boundary is the only place that turns internal intent into external output.
It can read outbound intents, apply deterministic policy checks, use a sandboxed model to rewrite public-facing content, validate the result, and release it.
It should not read raw external input. It should not have broad access to internal state.
The Standard Boundary Pattern Every boundary crossing should follow the same pattern:
untrusted input → ingress boundary → privileged core → egress boundary → external output
Both ingress and egress should use the same three-step shape:
Deterministic gate Sandboxed model step Deterministic gate
The model can provide signal. It can block. It is never the final authority.
That last point is essential. If the model can approve its own output, then a successful prompt injection can launder content through the system. The deterministic layer must always run afterward and decide whether promotion or release is actually allowed.
Ingress: Promote or Block On the inbound side, the flow should look something like this:
Raw content arrives. A deterministic pre-check normalizes it, strips dangerous or invisible characters, scores risk, and flags obvious injection or phishing patterns. A sandboxed model then produces a rewrite, summary, or classification, along with an explicit safety assessment. Finally, deterministic post-validation checks the schema, blocks forbidden carry-through, and refuses promotion unless all conditions are satisfied.
The result is binary: either the content becomes a promoted artifact, or it is blocked or quarantined.
There should be no third state where raw content quietly slips into privileged workflows “just for convenience.”
Egress: Release or Block On the outbound side, the same pattern applies.
The privileged core produces a structured intent, not a free-form final message. A deterministic pre-check evaluates that intent against exfiltration policy, forbidden patterns, unsupported claims, and channel rules. A sandboxed model may then rewrite the intent into channel-appropriate language and emit its own safety assessment. Deterministic post-validation checks again for secrets, unsafe links, policy violations, or newly introduced problems before anything is sent.
If something looks suspicious, it should be blocked rather than rewritten into something harmless-looking.
That prevents laundering in the other direction.
Promoted Artifacts Are the Real Trust Boundary Promoted artifacts are the only objects the privileged core should consume from external channels. That makes their schema discipline a critical long-term security concern.
A good promoted artifact uses rewritten labels instead of raw user-controlled strings wherever possible. It includes risk metadata, structured classification, and a safe internal rewrite. It avoids carrying raw HTML, raw markdown, MIME bodies, or arbitrary user text by default.
If raw data must be preserved, it should be exceptional, typed, length-bounded, and clearly marked as tainted.
This is where systems often decay. Application logic evolves. Someone adds “just one more string field.” Validators are patched less often than the code around them. Over time, the boundary widens without anyone intending it to.
Schema drift is not a theoretical concern. It is one of the most likely ways this model weakens in practice.
What “Sandboxed” Actually Means A sandboxed model is not just a model with a stern system prompt.
In this architecture, sandboxed means:
no access to privileged state no access to privileged tools no shared memory with the privileged core outputs treated as untrusted until deterministic validation passes
If the sandbox shares enough state with the rest of the system, then the boundary is only nominal.
Discovery Surfaces Still Matter There is a common temptation to treat listings and indexes as harmless because they only return identifiers or metadata.
That is a mistake.
Subject lines, titles, author names, thread names, labels, and other metadata are still user-controlled strings. They can still carry injection content. Discovery surfaces are lower-risk than full content bodies, but they are not zero-risk. They should at least go through deterministic normalization and screening before being shown on trusted operator surfaces.
This Does Not Eliminate Every Threat No architecture removes all risk.
This model primarily removes the direct raw-content path from hostile input into privileged reasoning and external output. It does not solve insider threats, validator mistakes, schema design failures, or every possible side channel. Human change control still matters. Code review still matters. Audit logging still matters. Separation of duties still matters.
Security always bottoms out somewhere.
The value of this model is that it changes which threats dominate. Instead of defending against scalable prompt injection with prompt wording alone, it shrinks the attack surface and concentrates residual risk into places where engineering controls, testing, and auditing can actually help.
Design Rules A few design rules follow naturally from this model:
Never let raw external content into the privileged core. Never let a model be the final authority on promotion or release. Never let one component both read hostile input and send externally. Prefer structured intents over free-form operator drafts. Treat all user-controlled strings as untrusted, including metadata. Preserve debuggability through explicit artifacts, not by widening trusted surfaces. Fail closed when validation is missing, ambiguous, or inconsistent. Treat schema discipline as an ongoing operational concern, not a one-time design choice.
The Point The goal is not to make a privileged model resistant to persuasion. The goal is to ensure the privileged part of the system never processes hostile content directly in the first place.
That is the shift.
Instead of asking a model to safely hold secrets while reasoning over attacker-controlled input, you break the dangerous loop at the architectural level. Untrusted input is transformed in a sandbox, validated deterministically, and only then promoted into a constrained schema. Privileged components act only on promoted artifacts and never directly emit external output.
This does not make attacks impossible.
It does make prompt injection a much less useful primitive.
Affordances, Not Instructions Richard McDaniel March 15, 2026
There's a pattern that plays out on many teams building LLM systems. The model does something wrong. Someone writes a new instruction. The next run looks better. Everyone moves on.
Then it happens again.
And again. And the prompt grows. And the tool descriptions start carrying policy. And the role definition starts carrying protocol. And at some point, nobody is entirely sure what the system actually does, because the real logic is buried in a thousand words of hidden text that nobody reads in one sitting.
This is not a model problem. It is a design problem. And the fix is not more instructions.
Affordance: a use or purpose that a thing can have, that people notice as part of the way they see or experience it, based on implicit understanding of how to interact with an object. A door handle affords pulling. A flat plate affords pushing. Nobody has to read instructions.
The Building Is on Fire
Consider two approaches to fire safety.
The first puts up signs: If there is a fire, exit the building. If smoke is visible but no alarm is sounding, still exit. If the nearest door is blocked, find another. If you are in a stairwell...
You can see where this goes. The manual grows forever, because reality has infinite edge cases and instructions are brittle things.
The second approach installs blaring alarms, glowing exit signs, emergency lighting, and wide unobstructed doors. It makes the danger unmistakable and the safe route obvious. It does not remove anyone's agency, the person still has to choose to leave. But it shapes the environment so that choosing correctly is easy, natural, and hard to miss.
That second approach is affordance. And it is the model we want for LLM systems.
Stop telling the model if X happens, do Y. Make X legible. Make Y available. Enforce the hard limits in code. Then let the model reason.
Why Instructions Fail at Scale
The appeal of the instruction approach is that it feels like progress. A failure happens, a rule gets added, the next run improves. But this paper trail has a hidden cost: instructions do not compose cleanly.
Every new rule interacts with every existing rule. Every exception creates pressure for another exception. Every scenario-specific patch creates a new maintenance surface. It is not just a prompt-length problem. It is a combinatorics problem. The more a system relies on verbal steering, the more it becomes a pile of fragile local fixes, each one correct in isolation and unpredictable in combination.
That does not scale. The system becomes harder to understand and less reliable at exactly the moment it needs to be both.
Instructions still matter, for goals, tone, and broad operating norms. The problem starts when prompts become the place where routing logic, causal bookkeeping, and exception handling go to hide.
What Affordance Actually Means
Affordance is not the absence of structure. It is structure in the right place.
In an LLM system, that usually means four things.
Make finality, provenance, and task type explicit. Don't make the model infer whether a question has been answered, what kind of request it is, or whether a thread is still open, when the application can expose those directly. The model should not be reconstructing what the system already knows from fragments of conversation.
Shape the action surface. If a tool supports reply, close, escalate, and label, those should be visible affordances, not buried in prose. Do not make the model reverse-engineer what it can do.
Keep hard constraints deterministic. Causality, permissions, idempotency, loop prevention, these are invariants. They belong in code. If a webhook was triggered by the system's own action, that is not an interpretive question for the model. That is routing state.
Reserve the model for semantics. Let it decide what a situation means. Has the question been answered? Is escalation needed? Is the thread settled? Those are exactly the judgments a language model is good at. Wire vocabulary, transport quirks, and causal bookkeeping are not.
Where the Line Is
There is a real risk of overcorrecting here. Bad prompt engineering and bad policy-engineering are the same mistake in different layers: encoding brittle case-by-case behavior where the system should instead expose clearer state and cleaner affordances. If deterministic code starts absorbing semantic policy, you just end up with a giant switch statement and that breaks down just as surely as a prompt that has grown into a policy manual.
The right architecture keeps the layers distinct:
Roles own goals, style, and judgment Artifacts own structured context and visible state Tools own native capabilities and channel operations Deterministic code owns invariants, provenance, and protocol translation
Not every instruction is bad. A tool saying you can close issues as completed, not_planned, or duplicate is a native affordance because it comes from the tool, scales with the tool, and is exactly where it should be. The problem is not vocabulary exposure. The problem is encoding scenario-by-scenario behavioral policy around that vocabulary in prompt text:
"Close as not_planned when the user asks about future platform support. Close as completed when no more assistance seems necessary. Do not close if the question sounds upset. Prefer duplicate if a roadmap issue already exists, unless the customer appears high-value." That is where systems rot. And the fix is not to patch harder, it is to improve the context the model is operating in.
Better Context Beats Better Nagging
When a model makes the wrong call, the instinct is to explain the right call more clearly. Usually, that is the wrong move.
If a future-compatibility question gets misclassified, the answer is not a new prompt rule saying for compatibility questions, prefer not_planned. The answer is to fix what the model sees. Expose thread type. Carry forward whether a definitive answer has already been given. Represent whether the issue appears settled.
This is the core of the affordance mindset: fix the world the model sees before you try to fix the words the model reads.
Provenance Is Not a Reasoning Problem
One of the most common failure modes in automation is pushing causality into the model.
The system posts a comment. Closes a thread. Receives a webhook triggered by its own action. Then asks the model to notice that this is not new work.
That is a design mistake. Whether an event was self-originated is first-class system state. It belongs in deterministic routing. If self-triggered actions can re-enter the queue as fresh work, that is not a prompt problem, it is a provenance bug.
The broader rule: whenever a system is relying on the model to remember who acted, what caused what, or whether a loop is self-generated, it is asking the wrong layer to do the work. Models are good at interpreting situations. Systems should be good at tracking causality.
The Hardest Part Is Stopping
Everything above is technically tractable. The hard part is restraint.
When the model still makes the wrong choice after the environment has been improved, the temptation is to write a larger evacuation manual. There is always another prompt patch available. Always another hidden rule. Always another attempt to rescue a marginal capability with more verbal scaffolding.
But if the model still cannot reliably make the right move once the environment is well-shaped, the honest answer is that the capability is not ready. That is not failure. That is good engineering judgment. A system that knows where automation stops is healthier than one that keeps pretending it can automate more than it really can.
A Few Rules That Follow
Keep roles, artifacts, and tools separate. Let tools expose their native affordances. Do not encode scenario-specific policy in prompts. Prefer improving visible context over adding hidden instructions. Keep provenance and invariants in deterministic code. Let the model decide semantics, not wire formats. And if good affordances still do not yield reliable behavior, remove the automation rather than patching it with more words.
Reliable LLM systems come less from ever-larger instruction piles and more from designing state, tools, and deterministic guardrails so the model can apply judgment in a world that is already legible.
Build the alarm, the exit sign, the open door, and the clear path out.
And if that still is not enough, stop asking the model to evacuate the building.
Roles Are Not Metaphors Richard McDaniel March 17, 2026
There is a common instinct when building systems with language models: keep it simple, keep it flat, don't dress the thing up in organizational costume. No roles. No meetings. No approvals. Just a model and a task.
This instinct mistakes sophistication for minimalism. And it quietly imports a worse assumption.
The Wrong Reading of a Good Rule
"Don't anthropomorphize" is good advice against a specific mistake: projecting inner experience onto things that don't have it. Assuming your build pipeline is frustrated. Assuming the model has preferences. Assuming a crashed server was trying.
That is the target. The rule was never meant to say that structures which resemble human organizations are therefore suspect. That reading turns a corrective into a prohibition, and it causes real design harm.
Roles, meetings, queues, approvals, handoffs, these exist in organizations not because humans are sentimental but because they are convergent solutions to hard structural problems. The same problems appear whenever partial information is distributed across multiple processors, authority is bounded, work is time-extended, objectives conflict, and decisions need to be explained after the fact.
These structures are not natural to minds. They are natural to the problem.
The Real Premise
Once a system has multiple bounded loci of reasoning with different permissions, different context, and different decision rights, organizational primitives stop being metaphor and become implementation. The question is not whether coordination structure exists. It is whether to make it explicit or hide it where it cannot be managed.
Most serious agentic systems, anything with extended work, authority constraints, or multiple specialized functions, already are multi-center coordination systems. The design choice is only whether to acknowledge that.
What the Vague Cloud Actually Is
The alternative to explicit coordination structure is not sophistication. It is usually a single large context, a single model, a single execution path, with everything implied rather than represented.
This sounds more machine-native. It is actually a worse anthropomorphism.
It imagines a system that holds all relevant state, resolves all competing objectives, tracks all causal history, acts legitimately everywhere at once, and explains itself coherently on demand. That is not a model of computation. That is a fantasy of omniscience dressed up as engineering restraint.
Real systems are not omniscient. They are distributed, bounded, and partial. Pretending otherwise does not make them simpler. It makes them fragile in ways that are hard to find and hard to fix.
A Concrete Contrast
Consider a system that handles requests with policy constraints, tool access, and a required ratification step before output is released.
Flat design: A single model reads the request, checks policy, gathers facts, chooses actions, and produces output, all from one context, all in one pass.
Structured design: An intake role parses intent and classifies the request. A policy role determines what actions are permitted. An execution role operates the relevant tools. A reviewer role ratifies the output before release.
These are not the same system with different aesthetics. They have different failure properties.
In the flat system, a policy violation is hard to localize. Did the model misread the policy? Did it have the wrong context? Did it reason correctly from bad inputs? The answer is somewhere in one large undifferentiated inference. In the structured system, each failure has an address. You can ask which role produced the wrong output, what state it was operating on, and what the handoff looked like.
In the flat system, authority is implicit. The model has whatever access was given to the whole context. There is no natural place to ask whether a particular action was in scope. In the structured system, scope is explicit per role. Overreach is detectable because it is a violation of a defined boundary, not just a bad outcome from an underspecified one.
In the flat system, a decision cannot be reproduced without the full conversation context and model state. In the structured system, each role's output is an artifact. The decision trace is visible.
A Role Is a Control Structure
Consider what a role actually provides.
It gives you a stable locus of memory: this reasoning happens from a consistent position, drawing on consistent context. It gives you bounded authority: the actions available here are defined and scoped, not arbitrary. It gives you an auditable identity: when a decision is made, you can ask who made it, from what standing, under what policy. It gives you a stable viewpoint: outputs can be interpreted in context, not just as text that emerged from an undifferentiated process.
Without these properties, you do not have a durable agentic unit. You have an unscoped inference step. Useful for narrow, well-defined tasks. Unable to deliberate, commit, or coordinate with other bounded reasoners in a principled way.
None of this requires a large system. It requires that coordination structure be represented where it can be inspected, tested, and changed.
The question is never whether a role looks like something from an org chart. The question is whether the problems a role solves are real. They are.
Coordination Primitives Are Not Decoration
The same logic applies to the other structures that tend to get dismissed as anthropomorphic window dressing.
A queue is not a metaphor for a human to-do list. It is a decoupling boundary between production and consumption, with explicit sequencing and load semantics. A system that needs those properties needs a queue whether or not it involves humans.
An approval is not a metaphor for a manager's signature. It is a control point where one bounded authority must explicitly ratify a transition before it proceeds. A system with real authority constraints needs explicit ratification points. Embedding them implicitly in prompt text is not simpler. It is less reliable and less auditable.
A meeting is a bounded deliberation protocol among differentiated participants who hold different context, different objectives, and different decision rights, with a time constraint and a required structured output. When a system needs to produce a decision that integrates those differences, the alternatives are to model that structure explicitly or to pretend the problem does not exist.
Explicit Structure Is Not Bureaucracy
A system where coordination structure lives in prompts has hidden it in the wrong layer. The logic becomes invisible to code review, untestable in any systematic way, unmaintainable as requirements change, and opaque when something goes wrong. The implementation looks thin because the real design is nowhere you can see it.
Explicit structure is not bureaucracy. Bureaucracy is structure that serves its own continuation rather than the work. Explicit structure that reflects real coordination requirements is just engineering.
The Point
The goal is not to make software feel like an organization.
The goal is to represent the real structure of cognition and coordination in a system that has multiple bounded reasoners, authority constraints, and extended work across time.
Some of the best names for those structures come from human organizations because humans encountered the same constraints first and developed working solutions. That is not a reason to mistrust the structures. It is a reason to understand what they actually solve, strip out what is merely conventional, and keep what is load-bearing.
A role is not a costume. It is a locus of memory, authority, and policy.
A queue is not an inbox. It is a coordination boundary.
An approval is not a signature. It is an explicit ratification point.
A meeting is not a ritual. It is a bounded deliberation protocol.
Build the primitives that the problem requires. Do not mistake explicitness for anthropomorphism, or flatness for clarity.
The hard part is knowing what the problem actually requires. That is what design is.