hugobowne · November 13, 2025 10:57
diff --git a/Context Engineering for AI Agents — Key Patterns and Best Practices (2025) b/Context Engineering for AI Agents — Key Patterns and Best Practices (2025)
 # Context Engineering for AI Agents — Key Patterns and Best Practices (2025)

 Short Description of Research Question

 What is “context engineering” for AI agents and what practical strategies, pitfalls, and best practices are recommended by current leading sources?

 ## Summary of Findings

 Context engineering is the art and science of filling an LLM’s context window with the right information at each step of an agent’s trajectory. Two recent, influential sources converge on a practical toolkit and operating principles:

 - Core patterns (Lance Martin): group approaches into four buckets — write, select, compress, and isolate — applied across instructions, knowledge, and tools.
  - Write: persist information outside the context window (scratchpads; long‑term memories via reflection/periodic synthesis) so it can be re‑used across steps/sessions.
  - Select: pull relevant context back in when needed (few‑shot episodic examples, procedural instructions, semantic facts), typically via embeddings and/or knowledge graphs; apply retrieval to tool descriptions to reduce tool overload and improve tool choice.
  - Compress: manage long trajectories via summarization (recursive/hierarchical; auto‑compaction after context is near full) and trimming/pruning heuristics or trained pruners.
  - Isolate: split context to reduce interference—multi‑agent separation of concerns; move state-heavy artifacts to environments/sandboxes; use runtime state schemas to expose only what’s needed per turn.

 - Retrieval-centric engineering (Latent Space x Jeff Huber, Chroma): ship retrieval systems, not “RAG” slogans, with a concrete inner/outer loop and guardrails.
  - Five retrieval tips:
    1) Don’t ship “RAG”; ship retrieval primitives (dense, lexical, filters, re‑rank, assembly, eval loop).
    2) Win first stage with hybrid recall; 200–300 candidates is fine.
    3) Always re‑rank before context assembly.
    4) Respect context rot: tight, structured contexts beat maximal windows.
    5) Build a small, high‑quality gold set; wire into CI, dashboards, and iterative eval.
  - Reference pipeline:
    - Ingest: domain‑aware parsing/chunking; enrich metadata; embeddings (dense + optional sparse); write to DB.
    - Query: first‑stage hybrid (vector + lexical/regex + filters) → candidate pool (~100–300) → LLM/cross‑encoder re‑rank to ~20–40 → careful context assembly (instructions first, dedupe, diversify, hard token cap).
    - Outer loop: cache/cost guardrails; generative benchmarking on a small gold set; error analysis; memory/compaction of interaction traces into retrievable facts.
  - Trends and tactics:
    - Context rot is real: longer contexts can degrade attention/reasoning; motivates tighter assembly and better retrieval.
    - LLMs as re‑rankers are increasingly practical; expect brute‑force re‑ranking to grow as costs/latency drop.
    - Continuous retrieval and staying in embedding space are promising future directions.
    - Code retrieval: regex remains dominant at scale; embeddings add incremental lift; support fast re‑indexing (e.g., index forking) and hybrid search.

 Practical takeaways for agent builders

 - Treat context as a first‑class product surface:
  - Design explicit ingestion, selection, and assembly stages; define token budgets and hard caps per stage.
  - Maintain a small but high‑quality gold set of (query ↔ expected context) pairs; automate evals (generative benchmarking) and wire into CI.
 - Hybrid recall + mandatory re‑rank:
  - Use vector + lexical/regex + metadata filters for breadth; re‑rank to a compact, high‑precision context set before calling the model.
 - Combat context rot:
  - Prefer structured, deduped, diversified, and tightly sized contexts over maximal windows; summarize or prune long histories.
 - Engineer memory explicitly:
  - Persist important facts/plans outside the window (scratchpads/state); synthesize long‑term memories periodically; retrieve them selectively.
 - Isolate to reduce interference:
  - Split tools/instructions across sub‑agents or environments; use runtime state schemas to control exposure per turn.
 - For code agents:
  - Combine regex/file search with embeddings and re‑ranking; chunk along semantic boundaries (e.g., AST); support rapid re‑indexing per commit/branch.

 ## Sources

 - [“RAG is Dead, Context Engineering is King” — with Jeff Huber of Chroma (Latent Space)](https://www.latent.space/p/chroma) — Defines a concrete retrieval pipeline; 5 retrieval tips; context rot implications; re‑ranking with LLMs; hybrid search; memory/compaction and outer‑loop evaluation; code search specifics (regex + embeddings, fast index forking).
 - [Context Engineering for Agents — Lance Martin](https://rlancemartin.github.io/2025/06/23/context_engineering/) — Frames context engineering across write/select/compress/isolate; details scratchpads, memory selection (embeddings/graphs), summarization/auto‑compaction and pruning, multi‑agent isolation, sandboxed code agents, and runtime state schemas.
No results found