RecursiveContextDecomposition.md

Recursive Context Decomposition Pattern Overview

Large language models have finite context windows. As context length approaches these limits, performance degrades significantly—a phenomenon researchers call "context rot." The model's attention becomes diluted, relevant information gets lost in the noise, and reasoning accuracy drops. Traditional solutions like summarization or truncation often result in the loss of critical information.

Recursive Context Decomposition is a pattern designed to overcome this limitation. It enables language models to process inputs that far exceed their native context window by treating the prompt itself as an external environment to be programmatically explored. Rather than attempting to fit all information into a single context, the agent loads the full input as an accessible data structure and writes code to systematically examine, filter, and recursively query subsets of that data.

The core mechanism involves three components working together: a code execution environment (typically a Python REPL) where the prompt is loaded as a string variable, a set of programmatic operations the agent can perform on this data (slicing, regex matching, keyword filtering), and a recursive self-invocation capability where the agent can spawn sub-queries on filtered subsets of the original input.

This pattern allows the model to maintain full access to the original data while focusing its limited attention on strategically selected portions. The agent operates in a loop: it examines the problem, writes code to filter or chunk the data based on domain knowledge, recursively calls itself (or a sub-model) on the refined subset, collects results, and synthesizes a final answer. Crucially, the model itself decides the chunking strategy—whether to use uniform splits, keyword-based filtering, or structured selection—based on its understanding of the task.

Practical Applications & Use Cases

Recursive Context Decomposition is essential for tasks where the input volume precludes single-pass processing but requires holistic reasoning.

1. Needle-in-a-Haystack Information Retrieval Finding specific facts buried in massive document collections where standard RAG might miss context.

Use Case: Legal discovery across thousands of contract pages.
Agent Flow: Load all contracts as searchable text variables → Write regex/keyword filters based on the query (e.g., "indemnification clauses mentioning third parties") → Recursively examine matching sections → Synthesize findings with citations.
Benefit: Handles document volumes that would overwhelm context windows while maintaining high precision.

2. Multi-Document Question Answering Answering questions that require synthesizing information across many sources.

Use Case: Research assistant analyzing 500+ academic papers.
Agent Flow: Load paper abstracts/sections → Programmatically filter by relevance signals → Recursively deep-dive into promising papers → Aggregate and reconcile findings.
Benefit: Scales to corpus sizes impossible for single-context approaches without losing the ability to cross-reference.

3. Codebase Navigation and Analysis Understanding and working with large software repositories.

Use Case: Debugging an issue across a million-line codebase.
Agent Flow: Load file tree and key files → Write code to trace imports and find relevant modules → Recursively examine dependency chains → Identify root cause.
Benefit: Enables whole-codebase reasoning without requiring manual file selection.

4. Long-Form Content Verification Fact-checking or consistency analysis across lengthy documents.

Use Case: Verifying claims in a 200-page report against source documents.
Agent Flow: Extract claims programmatically → For each claim, search source corpus → Recursively verify supporting evidence → Flag inconsistencies.
Benefit: Systematic verification at scales beyond human attention spans.

5. Comparative Analysis Tasks Problems requiring pairwise or multi-way comparisons across large datasets.

Use Case: Identifying duplicate or contradictory entries in a large database.
Agent Flow: Load dataset → Write comparison logic → Recursively subdivide comparison space → Aggregate matches.
Benefit: Handles quadratic complexity tasks that would explode single-context approaches.

Hands-On Code Example (LangGraph)

The implementation of Recursive Context Decomposition requires a stateful workflow to manage the decomposition strategy, the chunks of text, and the recursive accumulation of results. LangGraph is well-suited for this, allowing us to define a graph where the agent can analyze, decompose, and process data in a cycle.

The following code implements an agent that processes documents far exceeding context limits. It uses a "Strategy" node to decide how to break down the document, and a "Process" node to recursively query a sub-model.

import re
import operator
from typing import TypedDict, Annotated, List, Optional
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage, SystemMessage
from langgraph.graph import StateGraph, END

# --- Configuration ---
ROOT_MODEL = "gpt-4o"  # Primary reasoning model
SUB_MODEL = "gpt-4o-mini"  # Cheaper model for recursive sub-queries
MAX_CHUNK_SIZE = 6000  # Characters per chunk

# --- State Definition ---
class AgentState(TypedDict):
    """State maintained throughout the recursive decomposition process."""
    query: str
    document: str  # The full document loaded as explorable data
    chunks: List[str]  # Filtered/chunked subsets for processing
    sub_results: Annotated[List[str], operator.add]  # Accumulated findings
    strategy: Optional[str]  # The decomposition strategy chosen by the agent

# --- Initialize Models ---
root_llm = ChatOpenAI(model=ROOT_MODEL, temperature=0)
sub_llm = ChatOpenAI(model=SUB_MODEL, temperature=0)

# --- Node Functions ---

def analyze_and_strategize(state: AgentState) -> AgentState:
    """The root LLM decides on a decomposition strategy."""
    doc_length = len(state["document"])
    query = state["query"]
    
    if doc_length < MAX_CHUNK_SIZE:
        return {**state, "strategy": "direct", "chunks": [state["document"]]}
    
    strategy_prompt = f"""You are analyzing a document of {doc_length:,} characters.
    Query: "{query}"
    Devise a strategy to decompose it. Options:
    1. KEYWORD_FILTER: Extract sections matching specific keywords.
    2. UNIFORM_CHUNK: Split into equal-sized chunks.
    
    Respond with STRATEGY: [keyword_filter|uniform_chunk] and KEYWORDS if applicable."""
    
    response = root_llm.invoke([HumanMessage(content=strategy_prompt)])
    strategy_text = response.content
    
    # Simple parsing logic for demonstration
    if "keyword_filter" in strategy_text.lower():
        strategy = "keyword_filter"
        # In a real app, parse keywords here. We'll default to query terms.
        keywords = query.lower().split() 
    else:
        strategy = "uniform_chunk"
        keywords = []
        
    return {**state, "strategy": strategy}

def decompose_document(state: AgentState) -> AgentState:
    """Apply the chosen decomposition strategy."""
    document = state["document"]
    strategy = state["strategy"]
    chunks = []
    
    if strategy == "direct":
        return state
    
    if strategy == "keyword_filter":
        # Programmatic filtering
        paragraphs = re.split(r'\n\s*\n', document)
        # Simplified filter logic
        chunks = [p for p in paragraphs if any(w in p.lower() for w in state["query"].split())]
        
    if strategy == "uniform_chunk" or not chunks:
        # Fallback to uniform chunking
        chunks = [document[i:i + MAX_CHUNK_SIZE] for i in range(0, len(document), MAX_CHUNK_SIZE)]
    
    # Cap chunks for demo
    return {**state, "chunks": chunks[:5]}

def process_chunks(state: AgentState) -> AgentState:
    """Recursively query the sub-LLM on each chunk."""
    chunks = state["chunks"]
    query = state["query"]
    sub_results = []
    
    for i, chunk in enumerate(chunks):
        sub_prompt = f"""Analyze this text excerpt: "{chunk[:500]}..."
        To answer: "{query}"
        Extract relevant info or respond NO_INFO."""
        
        response = sub_llm.invoke([HumanMessage(content=sub_prompt)])
        if "NO_INFO" not in response.content:
            sub_results.append(f"[Chunk {i}]: {response.content}")
    
    return {**state, "sub_results": sub_results}

def synthesize_answer(state: AgentState) -> AgentState:
    """Synthesize all sub-query results."""
    results = "\n".join(state["sub_results"])
    if not results:
        return {**state, "final_answer": "No relevant information found."}
        
    response = root_llm.invoke([
        HumanMessage(content=f"Synthesize these findings into an answer for '{state['query']}':\n{results}")
    ])
    return {**state, "final_answer": response.content}

# --- Build the Graph ---
workflow = StateGraph(AgentState)
workflow.add_node("analyze", analyze_and_strategize)
workflow.add_node("decompose", decompose_document)
workflow.add_node("process", process_chunks)
workflow.add_node("synthesize", synthesize_answer)

workflow.set_entry_point("analyze")
workflow.add_edge("analyze", "decompose")
workflow.add_edge("decompose", "process")
workflow.add_edge("process", "synthesize")
workflow.add_edge("synthesize", END)

app = workflow.compile()

The code begins by defining an AgentState that holds the massive document and the query. The analyze_and_strategize node uses the Root LLM to look at the document's metadata (size) and the user's query to decide how to tackle the problem (e.g., keyword filtering vs. chunking). The decompose_document node executes this strategy using Python logic, effectively acting as a programmatic lens. The process_chunks node then iterates through the decomposed parts, using a smaller, faster model (sub_llm) to extract insights. Finally, synthesize_answer aggregates these partial insights into a coherent response.

Hands-On Code Example (Google ADK)

This pattern can also be implemented using Google's Agent Developer Kit (ADK) by utilizing SequentialAgent to orchestrate specialized agents for strategy, decomposition, and analysis.

from google.adk.agents import LlmAgent, SequentialAgent
from google.adk.tools import FunctionTool

# Tool to chunk documents
def chunk_text(text: str, strategy: str) -> list:
    # Simplified logic for demonstration
    if strategy == "split":
        return [text[i:i+5000] for i in range(0, len(text), 5000)]
    return [text]

chunk_tool = FunctionTool(chunk_text)

# 1. Strategy Agent
strategy_agent = LlmAgent(
    name="Strategist",
    model="gemini-2.0-flash",
    instruction="Analyze the document length and query. Decide if we need to 'split' the document or process 'direct'. Output the strategy.",
    output_key="strategy"
)

# 2. Decomposer Agent
decomposer_agent = LlmAgent(
    name="Decomposer",
    model="gemini-2.0-flash",
    instruction="Use the chunk_text tool based on the strategy provided in state['strategy']. Save chunks to state.",
    tools=[chunk_tool],
    output_key="chunks"
)

# 3. Analyst Agent (Process chunks)
analyzer_agent = LlmAgent(
    name="Analyzer",
    model="gemini-2.0-flash",
    instruction="Iterate through state['chunks']. For each chunk, extract information relevant to the user query. Compile findings.",
    output_key="findings"
)

# 4. Pipeline
recursive_pipeline = SequentialAgent(
    name="RecursiveContextPipeline",
    sub_agents=[strategy_agent, decomposer_agent, analyzer_agent]
)

In this ADK example, the SequentialAgent acts as the pipeline controller. The Strategist examines the problem, the Decomposer uses a tool to break the problem down, and the Analyzer processes the resulting data. This modular approach allows you to swap out the decomposition logic or the analysis model easily.

At a Glance

What: Large language models suffer from "context rot" when inputs grow too large. Traditional retrieval methods (RAG) can miss information that requires holistic understanding or complex filtering. Recursive Context Decomposition treats the large input not as a simple context string, but as a data environment to be explored programmatically.

Why: By loading the input as a variable and writing code to filter and chunk it, the agent preserves the integrity of the original data while focusing its attention only on relevant subsets. The recursive nature allows it to drill down hierarchically, managing cost and attention span. It shifts the paradigm from "reading everything" to "intelligently searching and reading what matters."

Rule of thumb: Use this pattern when the input text significantly exceeds the context window and the task requires reasoning over the entire corpus (e.g., "Summarize all financial risks across these 50 contracts") rather than simple fact retrieval. Avoid it for simple queries where standard RAG is faster and cheaper.

Visual Summary:

     ┌──────────────┐
     │  User Query  │
     └──────┬───────┘
            │
            ▼
┌─────────────────────────────────────────────────────────────────┐
│  CODE EXECUTION ENVIRONMENT                                     │
│  prompt = "..." # Full input loaded as string variable          │
└─────────────────────────────────────────────────────────────────┘
            │
            ▼
┌─────────────────────────────────────────────────────────────────┐
│                      ROOT LLM (Reasoning)                       │
│  "Input is too large. I'll filter by keyword..."                │
└──────────────────────────┬──────────────────────────────────────┘
                           │
              ┌────────────┼────────────┐
              ▼            ▼            ▼
        ┌─────────┐  ┌─────────┐  ┌─────────┐
        │ Subset 1│  │ Subset 2│  │ Subset 3│  (Programmatic
        └────┬────┘  └────┬────┘  └────┬────┘   Decomposition)
             │            │            │
             ▼            ▼            ▼
        ┌─────────┐  ┌─────────┐  ┌─────────┐
        │ Sub-LLM │  │ Sub-LLM │  │ Sub-LLM │  (Recursive
        └────┬────┘  └────┬────┘  └────┬────┘   Analysis)
             │            │            │
             └────────────┼────────────┘
                          │
                          ▼
                  ┌──────────────┐
                  │ Final Answer │
                  └──────────────┘

Fig 1: Recursive Context Decomposition Flow

Key Takeaways

Programmatic Exploration: The agent treats the prompt as an external database, writing code to filter and slice it based on the query.
Recursive Self-Invocation: By calling a sub-model (or itself) on smaller chunks, the agent bypasses hard context limits.
Adaptive Strategy: The agent decides how to decompose the text (regex, keyword, structure) rather than relying on a fixed chunking algorithm.
Cost Management: This pattern allows the use of cheaper models for sub-queries and powerful models for strategy and synthesis.

Conclusion

Recursive Context Decomposition represents a paradigm shift in handling scale. Rather than fighting context limits through compression, it embraces them by equipping the model with the tools to explore vast information spaces programmatically. The agent becomes an explorer—navigating data, recursively drilling into promising areas, and synthesizing findings. While it introduces latency trade-offs, it unlocks the ability to perform deep, comprehensive reasoning over inputs that were previously impossible to process.

References

Zhang, P., Kraska, T., & Khattab, O. (2025). Recursive Language Models. arXiv:2512.24601v1.
LangChain Documentation on Recursion: https://python.langchain.com/docs/
Google Agent Developer Kit (ADK) - Sequential Agents: https://google.github.io/adk-docs/

ehartford/RecursiveContextDecomposition.md

Select an option

No results found