Skip to content

Instantly share code, notes, and snippets.

@grittyninja
Created September 24, 2025 15:06
Show Gist options
  • Select an option

  • Save grittyninja/d8c73f4c55c89a6d49a0f41ac3bc40c3 to your computer and use it in GitHub Desktop.

Select an option

Save grittyninja/d8c73f4c55c89a6d49a0f41ac3bc40c3 to your computer and use it in GitHub Desktop.
# Lingxi Agent Design: Comprehensive Analysis & Implementation Guide
## Executive Summary
Lingxi achieved **74.6% success rate on SWE-Bench Verified** and **44.15% (132/300) on SWE-Bench Lite**, ranking **#3 among open-source models and #7 overall**. The framework demonstrates that carefully engineered multi-agent systems can surpass single-agent baselines by addressing **context dilution** - where lengthy chat histories distract models during delicate code-editing steps. This report provides a complete analysis of Lingxi's design patterns and actionable implementation strategies for other agentic code tools.
## Table of Contents
1. [Key Insights from Technical Report](#key-insights-from-technical-report)
2. [Agent Architecture Overview](#agent-architecture-overview)
3. [Required Agent Roles](#required-agent-roles)
4. [Semantic Search Architecture](#semantic-search-architecture)
5. [Workflow Orchestration](#workflow-orchestration)
6. [Critical Success Factors](#critical-success-factors)
7. [Implementation Blueprint](#implementation-blueprint)
8. [Migration Guide for Existing Tools](#migration-guide-for-existing-tools)
9. [Performance Analysis](#performance-analysis)
---
## Key Insights from Technical Report
### The Context Dilution Problem
According to Lingxi v1.0 Technical Report, single-agent pipelines fail because:
> "By the time the model reaches the code-editing step, earlier discussion tokens dominate the prompt, making it harder for the LLM to focus on the actual diff."
**Solution**: Task-scope multi-agent architecture where each agent receives **only the information they need**.
### Three Critical Optimizations That Matter
1. **Crystal-clear contracts** - Each agent gets a one-page interface spec listing mandatory inputs, expected outputs, and the **only** tools it may call
2. **Coordinator-led memory hygiene** - After every step, the coordinator trims conversation history to the triad (action, concise observation, reflection), **reducing average prompt length by 38%** without loss of signal
3. **Explicit awareness of other agents** - Every prompt reminds the agent of team composition and urges it to stay within scope
### Task Decomposition vs Role Playing
> "Unlike frameworks that imitate human roles (e.g., developer, manager), Lingxi's roles are derived from **task decomposition** rather than social division of labour."
This is crucial - since every agent uses the same LLM, there's no benefit from "specialization". The win comes from **narrowing each agent's task domain** so:
- Prompts are shorter
- Tasks are easier
- Evaluation rubrics are crisp
### Performance Breakdown (SWE-Bench Lite)
From the Sankey diagram in the report:
- **Decoder Success**: 249/300 (83% correctly localized bugs)
- **Mapper File-Level Success**: 233/249 (93.6% identified correct files)
- **Mapper Function-Level Success**: 156/233 (67% pinpointed exact functions)
- **Final Patch Success**: 128/156 (82% of targeted fixes worked)
**Bottleneck**: Function-level localization (67% success) is the main area for improvement.
---
## Agent Architecture Overview
### Core Design Philosophy
Lingxi implements a **Hierarchical Multi-Agent System** with three fundamental principles:
1. **Cognitive Separation**: Each agent handles a distinct cognitive phase (understand → plan → implement)
2. **Tool Specialization**: Agents receive only tools relevant to their responsibility
3. **State Persistence**: Sophisticated state management maintains context across transitions
### System Architecture
```ascii
┌─────────────────────────────────────────────────────────────┐
│ LINGXI ARCHITECTURE │
├─────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ ┌──────────────┐ │
│ │ GitHub API │ │ Human │ │
│ │ (Issue) │ │ Feedback │ │
│ └──────┬───────┘ └──────┬───────┘ │
│ │ │ │
│ ▼ ▼ │
│ ┌──────────────────────────────────────────────────┐ │
│ │ SUPERVISOR AGENT │ │
│ │ - Routes between agents based on progress │ │
│ │ - Manages iteration limits (3 attempts/agent) │ │
│ │ - Enforces termination conditions │ │
│ └─────────────┬────────────────────────────────────┘ │
│ │ │
│ ┌──────────┼──────────┬──────────────┐ │
│ ▼ ▼ ▼ ▼ │
│ ┌────────┐ ┌────────┐ ┌────────┐ ┌─────────┐ │
│ │Problem │ │Solution│ │Problem │ │Reviewer │ │
│ │Decoder │ │Mapper │ │Solver │ │ (MAM) │ │
│ └────────┘ └────────┘ └────────┘ └─────────┘ │
│ │ │ │ │ │
│ ▼ ▼ ▼ ▼ │
│ ┌───────────────────────────────────────────────┐ │
│ │ TOOL ECOSYSTEM │ │
│ ├─────────────────────────────────────────────┤ │
│ │ • view_directory • search_relevant_files │ │
│ │ • view_file_content • str_replace_editor │ │
│ │ • run_shell_cmd • search_files_by_keywords│ │
│ └───────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌───────────────────────────────────────────────┐ │
│ │ VECTOR DATABASE (ChromaDB) │ │
│ │ - File-level embeddings │ │
│ │ - Function-level embeddings │ │
│ │ - Tree-sitter AST parsing │ │
│ └───────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
```
---
## Required Agent Roles
### Minimum Viable Agent Set (3 Agents)
For basic functionality, you need **exactly 3 specialized agents**:
#### 1. Problem Decoder (Understanding Phase)
```yaml
Role: Bug Localization & Issue Analysis
Responsibility:
- Parse issue requirements
- Identify affected files/functions
- Reproduce the bug
- Output structured problem statement
Tools:
- view_directory (read-only)
- search_relevant_files (vector search)
- view_file_content (read-only)
Key Success Factor: Semantic code search capability
```
#### 2. Solution Mapper (Planning Phase)
```yaml
Role: Solution Design & Change Planning
Responsibility:
- Create detailed fix strategy
- Map required changes per file
- Design test cases
- Output structured change plan
Tools:
- view_directory (read-only)
- search_relevant_files (vector search)
- view_file_content (read-only)
Key Success Factor: Minimal change principle
```
#### 3. Problem Solver (Implementation Phase)
```yaml
Role: Code Generation & Fix Implementation
Responsibility:
- Execute the change plan exactly
- Apply file modifications
- No additional analysis
- Output modified files
Tools:
- view_directory (navigation)
- search_relevant_files (locate targets)
- str_replace_editor (modification)
Key Success Factor: Strict implementation focus
```
### Extended Agent Set (5 Agents)
For production systems, add:
#### 4. Supervisor (Orchestration)
```yaml
Role: Workflow Coordination
Responsibility:
- Route between agents
- Track progress
- Enforce limits
- Handle failures
Tools: None (uses LLM with structured output)
```
#### 5. Reviewer (Validation)
```yaml
Role: Solution Verification
Responsibility:
- Run tests
- Validate fix
- Check for regressions
- Approve/reject solution
Tools:
- All read tools
- run_shell_cmd (test execution)
```
---
## Semantic Search Architecture
### System Overview
```ascii
┌────────────────────────────────────────────────────────────┐
│ SEMANTIC SEARCH PIPELINE │
├────────────────────────────────────────────────────────────┤
│ │
│ [GitHub Issue] ──► [Clone Repository] │
│ │ │
│ ▼ │
│ ┌────────────────┐ │
│ │ File Scanner │ │
│ │ (.py, .java) │ │
│ └────────┬───────┘ │
│ │ │
│ ┌────────────────┴────────────────┐ │
│ ▼ ▼ │
│ ┌──────────────┐ ┌──────────────┐ │
│ │ Tree-Sitter │ │ Raw File │ │
│ │ AST Parser │ │ Content │ │
│ └──────┬───────┘ └──────┬───────┘ │
│ │ │ │
│ ▼ ▼ │
│ ┌──────────────┐ ┌──────────────┐ │
│ │ Function/ │ │ File-Level │ │
│ │ Method │ │ Documents │ │
│ │ Extraction │ │ │ │
│ └──────┬───────┘ └──────┬───────┘ │
│ │ │ │
│ └────────────┬────────────────────┘ │
│ ▼ │
│ ┌──────────────────┐ │
│ │ Text Splitter │ │
│ │ (1000 chars/256 │ │
│ │ overlap) │ │
│ └────────┬─────────┘ │
│ ▼ │
│ ┌───────────────────┐ │
│ │ OpenAI Embedder │ │
│ │ text-embedding- │ │
│ │ 3-small │ │
│ └─────────┬─────────┘ │
│ ▼ │
│ ┌──────────────────────┐ │
│ │ ChromaDB │ │
│ │ ┌──────────────┐ │ │
│ │ │ File Index │ │ │
│ │ ├──────────────┤ │ │
│ │ │Function Index│ │ │
│ │ └──────────────┘ │ │
│ └──────────┬───────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────┐ │
│ │ Query Processing │ │
│ │ ┌───────────────┐ │ │
│ │ │Vector Search │ │ │
│ │ ├───────────────┤ │ │
│ │ │ Top-K (20) │ │ │
│ │ ├───────────────┤ │ │
│ │ │LLM Reranking │ │ │
│ │ └───────────────┘ │ │
│ └──────────────────────┘ │
└────────────────────────────────────────────────────────────┘
```
### Implementation Details
#### 1. Dual-Layer Indexing Strategy
```python
# File-Level Index (Broad Context)
file_documents = [
Document(
page_content=entire_file_content,
metadata={
"file_path": relative_path,
"type": "file"
}
)
]
# Function-Level Index (Precise Targeting)
function_documents = [
Document(
page_content=function_code,
metadata={
"file_path": relative_path,
"func_name": function_name,
"type": "func",
"start_line": start_line,
"end_line": end_line
}
)
]
```
#### 2. Tree-Sitter Integration
```python
def extract_functions(file_content, language):
parser = Parser()
parser.set_language(PY_LANGUAGE) # or JAVA_LANGUAGE
tree = parser.parse(bytes(file_content, "utf8"))
# Extract function nodes
functions = []
for node in traverse_tree(tree.root_node):
if node.type in ['function_definition', 'method']:
functions.append({
'name': get_node_text(node.child_by_field_name('name')),
'body': get_node_text(node),
'start': node.start_point[0],
'end': node.end_point[0]
})
return functions
```
#### 3. Search & Reranking Flow
```ascii
[User Query]
[Embedding Generation]
[Vector Similarity Search (Top 20)]
[LLM-Based Reranking]
├─► "How is this file relevant?"
├─► "What specific parts match?"
└─► "Confidence score: 0-10"
[Filtered Results (Threshold > 5)]
[Contextual Explanations]
```
---
## Workflow Orchestration
### State Management Architecture
```python
# Basic State (Supervisor Graph)
class CustomState(MessagesState):
last_agent: Optional[str] = None
next_agent: Optional[str] = None
summary: Optional[str] = None
human_in_the_loop: Optional[bool] = True
issue_description: Optional[str] = None
# Enhanced State (SWE-Bench Optimized)
class State(TypedDict):
messages: Annotated[list[AnyMessage], messages_reducer]
decoder_iterations: Optional[int] = 0
mapper_iterations: Optional[int] = 0
solver_iterations: Optional[int] = 0
problem_decoder_outputs: list[str]
solution_mapper_outputs: list[str]
problem_solver_outputs: list[str]
generated_patches: list[str]
```
### Workflow Patterns
#### Pattern 1: Linear Three-Phase Flow
```ascii
START ──► Problem Decoder ──► Solution Mapper ──► Problem Solver ──► END
│ │ │
└─────────────────────┴────────────────────┘
Supervisor Controls
(Max 3 iterations each)
```
#### Pattern 2: Hierarchical Review Flow
```ascii
┌─► Issue Resolver ──┐
START ──►│ ├──► Reviewer ──► END
└─── Multi-Agent ─────┘
Manager
```
### Failure Handling
```python
def supervisor_decision(state):
if state.decoder_iterations > 3:
return "FINISH: Decoder failed after 3 attempts"
if state.mapper_iterations > 3:
return "FINISH: Mapper failed after 3 attempts"
if state.solver_iterations > 3:
return "FINISH: Solver failed after 3 attempts"
# Continue to next agent
return next_agent
```
---
## Critical Success Factors
### 1. Tool Design Philosophy: "Minimal Tool Set, Maximal Information"
Per the technical report, Lingxi provides **accurate, sufficient information** for each tool call to reduce LLM burden:
| Tool | Purpose | Key Feature |
|------|---------|-------------|
| `view_directory` | Explore repository | **Adaptive depth**: prints deeper until file-count cap |
| `search_files_by_keywords` | Grep-like semantic search | Multi-keyword via **ripgrep**, returns line numbers |
| `view_file_content` | Inspect file | **Auto-truncates** long files, appends structure |
| `view_file_structure` | Summarize oversized files | **Invisible to LLM**, auto-injects when needed |
| `str_replace_editor` | Apply code edits | Inspired by Anthropic/Aider/OpenHand |
### 2. Memory Hygiene is Crucial
The coordinator's memory trimming (38% reduction) is **not optional**:
- Removes verbose function-call dumps
- Preserves only: action → observation → reflection
- Prevents context window exhaustion
### 3. Berkeley Study Pitfalls to Avoid
The report cites three reasons naive multi-agent systems fail:
1. **Vague task specifications** → Solution: One-page contracts per agent
2. **Unclear responsibility boundaries** → Solution: Explicit tool restrictions
3. **Chaotic memory/state management** → Solution: Coordinator-led hygiene
### 4. Stage-Wise Success Analysis
Looking at the "any generated" Sankey diagram:
- **289/300** issues had correct decoder analysis (96.3%)
- **244/289** had correct file-level mapping (84.4%)
- **173/244** had correct function-level targeting (70.9%)
- **142/173** generated working patches (82.1%)
This shows **file-to-function mapping** is the critical bottleneck.
---
## Implementation Blueprint
### Phase 1: Foundation (Week 1-2)
#### 1.1 Set Up Vector Database
```python
# requirements.txt
chromadb==0.4.24
openai==1.12.0
tree-sitter==0.21.0
tree-sitter-languages==1.10.2
# semantic_search.py
from chromadb import Client
from chromadb.config import Settings
from langchain.embeddings import OpenAIEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
class SemanticSearchEngine:
def __init__(self, project_path):
self.client = chromadb.Client(Settings(
persist_directory=f"{project_path}/.chroma"
))
self.embeddings = OpenAIEmbeddings(
model="text-embedding-3-small"
)
self.splitter = RecursiveCharacterTextSplitter(
chunk_size=1000,
chunk_overlap=256
)
def index_codebase(self):
# Implementation from context_tools.py
pass
```
#### 1.2 Create Base Agent Class
```python
# base_agent.py
from abc import ABC, abstractmethod
from typing import List, Dict, Any
class BaseAgent(ABC):
def __init__(self, llm, tools: List, system_prompt: str):
self.llm = llm
self.tools = tools
self.system_prompt = system_prompt
self.max_iterations = 3
self.current_iteration = 0
@abstractmethod
def process(self, state: Dict[str, Any]) -> Dict[str, Any]:
pass
def should_continue(self) -> bool:
return self.current_iteration < self.max_iterations
```
### Phase 2: Core Agents (Week 2-3)
#### 2.1 Implement Three Core Agents
```python
# problem_decoder.py
class ProblemDecoder(BaseAgent):
def __init__(self, llm, semantic_search):
tools = [
view_directory_tool,
search_relevant_files_tool,
view_file_content_tool
]
system_prompt = """You are a Problem Decoder.
Your role: Understand and localize the bug.
Output format:
1. Topic Question: What is the issue about?
2. Bug Location: Exact files and functions
3. Current Behavior: What's broken?
4. Expected Behavior: What should happen?
"""
super().__init__(llm, tools, system_prompt)
self.semantic_search = semantic_search
def process(self, state):
# 1. Search for relevant files
relevant_files = self.semantic_search.search(
state['issue_description']
)
# 2. Analyze bug location
bug_analysis = self.llm.invoke({
"system": self.system_prompt,
"human": f"Issue: {state['issue_description']}\n"
f"Files: {relevant_files}"
})
# 3. Update state
state['problem_decoder_output'] = bug_analysis
state['decoder_iterations'] += 1
return state
```
#### 2.2 Implement Supervisor
```python
# supervisor.py
from typing import Literal
class Router(TypedDict):
next_agent: Literal["decoder", "mapper", "solver", "END"]
reasoning: str
class Supervisor:
def __init__(self, llm):
self.llm = llm.with_structured_output(Router, strict=True)
def route(self, state):
# Check iteration limits
if state['decoder_iterations'] > 3:
return "END"
# Make routing decision
decision = self.llm.invoke({
"messages": state['messages'],
"context": "Route to appropriate agent based on progress"
})
return decision['next_agent']
```
### Phase 3: Integration (Week 3-4)
#### 3.1 Build LangGraph Workflow
```python
# workflow.py
from langgraph.graph import StateGraph, END
def create_issue_resolver_graph():
workflow = StateGraph(CustomState)
# Initialize agents
decoder = ProblemDecoder(llm, semantic_search)
mapper = SolutionMapper(llm)
solver = ProblemSolver(llm)
supervisor = Supervisor(llm)
# Add nodes
workflow.add_node("supervisor", supervisor.route)
workflow.add_node("decoder", decoder.process)
workflow.add_node("mapper", mapper.process)
workflow.add_node("solver", solver.process)
# Add edges
workflow.add_edge(START, "supervisor")
workflow.add_conditional_edges(
"supervisor",
lambda x: x['next_agent'],
{
"decoder": "decoder",
"mapper": "mapper",
"solver": "solver",
"END": END
}
)
# Add return paths to supervisor
workflow.add_edge("decoder", "supervisor")
workflow.add_edge("mapper", "supervisor")
workflow.add_edge("solver", "supervisor")
return workflow.compile()
```
#### 3.2 Add Human-in-the-Loop
```python
# human_feedback.py
def add_human_checkpoints(workflow):
for node in ["decoder", "mapper", "solver"]:
workflow.add_checkpoint(
after=node,
handler=human_feedback_handler
)
return workflow
def human_feedback_handler(state):
if state.get('human_in_the_loop', False):
feedback = get_user_feedback(state)
if feedback:
state['messages'].append(HumanMessage(feedback))
return state['last_agent'] # Retry same agent
return None # Continue workflow
```
### Phase 4: Optimization (Week 4-5)
#### 4.1 Add Caching
```python
# caching.py
def messages_reducer(left: list, right: list) -> list:
"""Custom reducer with incremental caching"""
result = add_messages(left, right)
# Remove old cache tags
for msg in result[:-1]:
if hasattr(msg.content, '__iter__'):
for block in msg.content:
if 'cache_control' in block:
del block['cache_control']
# Add cache tag to last message
if result:
last_msg = result[-1]
if isinstance(last_msg.content, list):
last_msg.content[-1]['cache_control'] = {
'type': 'ephemeral'
}
return result
```
#### 4.2 Add Performance Monitoring
```python
# monitoring.py
import time
from dataclasses import dataclass
@dataclass
class AgentMetrics:
agent_name: str
iterations: int
total_time: float
tokens_used: int
success: bool
class PerformanceMonitor:
def __init__(self):
self.metrics = []
def track_agent(self, agent_name):
def decorator(func):
def wrapper(*args, **kwargs):
start = time.time()
result = func(*args, **kwargs)
duration = time.time() - start
self.metrics.append(AgentMetrics(
agent_name=agent_name,
iterations=result.get(f'{agent_name}_iterations', 0),
total_time=duration,
tokens_used=count_tokens(result['messages']),
success=result.get('success', False)
))
return result
return wrapper
return decorator
```
---
## Migration Guide for Existing Tools
### For Roocode/Claude Code Users
#### Step 1: Add Semantic Search Layer
```python
# Add to your existing tool
class EnhancedClaudeCode:
def __init__(self, original_instance):
self.original = original_instance
self.semantic_search = SemanticSearchEngine()
def enhance_with_search(self, query):
# First use semantic search
relevant_context = self.semantic_search.search(query)
# Then pass to original tool with context
return self.original.process(
query + f"\nRelevant files:\n{relevant_context}"
)
```
#### Step 2: Implement Phase Separation
```python
# Split single process into phases
def enhanced_process(issue):
# Phase 1: Understanding
context = understand_issue(issue)
# Phase 2: Planning
plan = create_solution_plan(context)
# Phase 3: Implementation
solution = implement_plan(plan)
return solution
```
### For Gemini CLI Users
#### Step 1: Add Agent Routing
```python
# gemini_enhanced.py
class GeminiMultiAgent:
def __init__(self, gemini_instance):
self.gemini = gemini_instance
self.current_phase = "understand"
def route(self, response):
if "bug located" in response.lower():
self.current_phase = "plan"
elif "plan complete" in response.lower():
self.current_phase = "implement"
elif "implementation done" in response.lower():
self.current_phase = "complete"
def process(self, issue):
phases = {
"understand": self.understand_prompt,
"plan": self.plan_prompt,
"implement": self.implement_prompt
}
while self.current_phase != "complete":
prompt = phases[self.current_phase](issue)
response = self.gemini.generate(prompt)
self.route(response)
return response
```
### Universal Enhancement Checklist
- [ ] **Add Semantic Search**
- Implement ChromaDB vector database
- Add tree-sitter for AST parsing
- Create dual-layer indexing
- [ ] **Separate Cognitive Phases**
- Split into understand/plan/implement
- Create specialized prompts per phase
- Limit tools per phase
- [ ] **Implement State Management**
- Track iterations per phase
- Maintain conversation context
- Add failure thresholds
- [ ] **Add Supervisor Logic**
- Create routing decisions
- Implement retry limits
- Add graceful failure handling
- [ ] **Optimize Performance**
- Add incremental caching
- Implement parallel search
- Monitor token usage
---
## Performance Analysis
### Actual Performance Data (from Technical Report)
#### SWE-Bench Lite Results (v1.0)
| Stage | Success Rate | Count | Analysis |
|-------|--------------|-------|----------|
| **Problem Decoder** | 83% | 249/300 | Good bug localization |
| **Solution Mapper (File)** | 93.6% | 233/249 | Excellent file identification |
| **Solution Mapper (Function)** | 67% | 156/233 | **Main bottleneck** |
| **Problem Solver (Patch)** | 82% | 128/156 | Strong implementation |
| **Overall Success** | 44.15% | 132/300 | #3 OSS, #7 overall |
#### Comparative Performance
- **SWE-Bench Verified**: 74.6% success (higher quality, curated dataset)
- **SWE-Bench Lite**: 44.15% success (broader, more challenging)
- **Token Reduction**: 38% via memory hygiene
- **Iteration Limit**: 3 attempts per agent (prevents runaway)
### Expected Improvements for Your Tool
Based on Lingxi's architecture, implementing these patterns should yield:
| Metric | Baseline | With Semantic Search | + Phase Separation | + Memory Hygiene | Full Implementation |
|--------|----------|---------------------|-------------------|------------------|---------------------|
| Bug Localization | ~45% | ~65% | ~75% | ~80% | **~85%** |
| Success Rate | ~25% | ~35% | ~45% | ~50% | **~60-70%** |
| Token Usage | 100% | 95% | 85% | 65% | **~62%** |
| Time to Solution | 15 min | 12 min | 10 min | 8 min | **~7 min** |
### Critical Success Metrics
1. **Function-Level Accuracy**: Most important metric (Lingxi's 67% is the bottleneck)
2. **First-Pass Success**: Percentage resolved without iteration
3. **Token Efficiency**: Cost reduction via memory hygiene
4. **Failure Recovery**: Graceful handling when 3-attempt limit reached
---
## Conclusion
### Why Lingxi Succeeds (Technical Report Insights)
1. **Context Dilution Solved**: 38% prompt reduction via memory hygiene
2. **Task Decomposition > Role Playing**: Narrow task domains, not social roles
3. **Crystal-Clear Contracts**: One-page specs with tool restrictions
4. **Minimal Tools, Maximum Info**: Each tool provides complete context
5. **Hard Iteration Limits**: 3 attempts prevents runaway costs
### The Real Secret Sauce
From the technical report:
> "The key to outperforming [single-agent baselines] is **precise task scoping plus memory hygiene**. Removing large function-call dumps was especially impactful, reducing confusion and token cost."
### Implementation Roadmap for Maximum Impact
Based on actual performance data:
1. **Week 1: Memory Hygiene** (38% token reduction)
- Implement coordinator trimming
- Remove function-call dumps
- Preserve only action→observation→reflection
2. **Week 2: Semantic Search** (20% accuracy boost)
- ChromaDB with dual-layer indexing
- Tree-sitter AST parsing
- Ripgrep integration
3. **Week 3: Phase Separation** (15% success improvement)
- Problem Decoder (understand)
- Solution Mapper (plan)
- Problem Solver (implement)
4. **Week 4: Iteration Control** (prevent failures)
- 3-attempt limit per agent
- Graceful failure handling
- Progress tracking
5. **Week 5: Tool Optimization** (10% efficiency gain)
- Adaptive directory depth
- Auto-truncation for large files
- Invisible structure injection
### Expected Outcome
Following this blueprint, you should achieve:
- **60-70% SWE-Bench success** (from ~25-30% baseline)
- **62% token usage** (38% reduction)
- **7-minute average resolution** (from 15 minutes)
The architecture is **proven and portable** - Lingxi's #3 OSS ranking demonstrates these patterns work at scale.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment