Building a Cursor-Quality AI Agent for 3D Scene Generation: Architecture Deep Dive

Executive Summary

Building an AI agent that can interpret natural language like "create a scene where I'm going to assemble legos" and produce a fully realized 3D environment with physics-ready objects, lighting, and scripted behaviors is one of the most demanding applications of agentic AI. It combines the hardest problems in the field: multi-step planning over ordered physical constraints, retrieval over structured asset catalogs with physics metadata, code generation in a domain-specific context (C# game scripts), and tight tool integration with a real-time engine.

This article examines how leading AI-powered creation tools—Cursor, Devin, Replit Agent, GitHub Copilot, Bolt.new, and Vercel v0—architect their backends, and distills the patterns that matter for a 3D scene generation agent embedded in a C++ game engine. We compare seven major agent frameworks (LangChain/LangGraph, CrewAI, AutoGen/Semantic Kernel, DSPy, Haystack, and custom solutions), analyze reasoning architectures (ReAct, Plan-and-Execute, Tree of Thoughts), survey memory and RAG approaches for structured asset data, and conclude with a concrete recommended architecture for LuckyEngine—one that can run locally with 7–30B parameter models via Ollama while delivering the kind of reliable, context-aware experience that makes tools like Cursor feel magical.

1. What Top Companies Actually Use

The most important finding from this research: none of the leading AI creation tools use LangChain or LangGraph as their core agent framework in production. The trend is overwhelmingly toward custom agent loops with targeted use of framework components.

Cursor

Cursor is a full fork of VS Code—not a plugin—giving it complete control over the file system, terminal, and project context. Its secret weapon is not the underlying LLM but its codebase indexing pipeline:

Semantic chunking: Code is split into meaningful units (functions, classes, logical blocks) using AST-based parsing via tree-sitter, not naive line or character splitting.
Embedding: Chunks are embedded using either OpenAI's embedding models or custom code-tuned models (candidates include Voyage AI's voyage-code-2).
Vector storage: Embeddings are stored in Turbopuffer, a serverless vector search engine optimized for fast nearest-neighbor retrieval. Only metadata (obfuscated file paths, line ranges) is stored remotely; raw code never leaves the local machine.
Merkle tree sync: A Merkle tree checks for hash mismatches every 10 minutes, uploading only changed files. Embedding caching keyed by chunk hash makes re-indexing fast.
Retrieval impact: Semantic search improves agent response accuracy by 12.5% on average and produces code changes more likely to be retained in codebases.

Cursor's architecture is essentially a custom RAG-augmented agent loop with deep IDE integration. No framework dependency.

Devin (Cognition)

Devin operates as a compound AI system—not a single model but a swarm of specialized models:

The Planner: A high-reasoning model that decomposes tasks into step-by-step plans.
The Coder: A specialized model trained on code.
The Critic: An adversarial model that reviews code for security and logic errors.
The Browser: An agent that scrapes and synthesizes documentation.

Each operates within a sandboxed cloud environment with shell, editor, and browser. Devin 2.0 introduced multi-agent parallel execution and interactive planning with confidence-based clarification requests. The architecture is entirely custom.

Replit Agent

Replit Agent is the one major player that does use LangGraph for its multi-agent system, with LangSmith for observability. Key architectural choices:

Multi-agent with roles: Manager agent oversees workflow; editor agents handle specific coding tasks; verifier agent validates output.
Custom tool invocation: Rather than standard function calling APIs, Replit generates code in a restricted Python-based DSL to invoke 30+ integrated tools.
Memory compression: LLMs condense long conversation trajectories to retain only relevant information within context windows.
Claude 3.5 Sonnet as the primary model (as of their LangChain case study).

GitHub Copilot, Bolt.new, Vercel v0

GitHub Copilot: IDE-integrated pair-programming assistant built by GitHub/OpenAI. Custom infrastructure, not framework-based. Deep integration with VS Code's language server protocol for context.
Bolt.new: Uses Claude Sonnet on StackBlitz's WebContainers, generating full-stack apps (React frontend, Node.js backend, database) entirely in-browser. Custom orchestration.
Vercel v0: Generates React/Next.js/Tailwind code from natural language. Focused on frontend component generation with a custom pipeline.

Claude Code (Anthropic)

Claude Code's architecture offers perhaps the most instructive model for LuckyEngine:

Single-threaded master loop: A deliberately simple design where Claude evaluates the current state, emits tool calls, receives results, and repeats until done.
TodoWrite planning: Structured JSON task lists with IDs, status tracking, and priority levels that render as interactive checklists.
Reminder injection: After tool use, current TODO list states are injected as system messages to prevent the model from losing track during long conversations.
Sub-agent dispatch: Controlled parallelism through sub-agents with strict depth limitations to prevent recursive spawning.

Industry Summary

Tool	Framework	Agent Style	Key Innovation
Cursor	Custom	Single agent + RAG	Semantic codebase indexing, Turbopuffer
Devin	Custom	Multi-model compound system	Specialized model roles (planner/coder/critic)
Replit Agent	LangGraph + LangSmith	Multi-agent with roles	Custom DSL for tool invocation
GitHub Copilot	Custom	Single agent	Deep IDE/LSP integration
Bolt.new	Custom	Single agent	WebContainers browser-based execution
Claude Code	Custom (Agent SDK)	Single-threaded master loop	TodoWrite planning, reminder injection

The pattern is clear: production-grade AI creation tools build custom agent loops. They may use framework components (Replit uses LangGraph for orchestration) but the core intelligence—planning, retrieval, tool dispatch—is bespoke.

2. Agent Frameworks Comparison

Despite the custom trend above, frameworks provide valuable building blocks. Here is how they compare for a tool-heavy, multi-step 3D scene planning use case.

LangChain / LangGraph

LangChain (97,000+ GitHub stars, 50,000+ production apps) is the fastest way to build a standard tool-calling agent. LangGraph, built on top of it, adds stateful graph-based execution for complex workflows.

LangGraph 1.0 (October 2025) provides durable state persistence, built-in human-in-the-loop patterns, and a commitment to no breaking changes until 2.0.
Running in production at LinkedIn, Uber, Replit, Elastic, Klarna, and ~400 other companies.
Start with LangChain's high-level APIs, drop down to LangGraph when you need more control.

For LuckyEngine: Good graph-based execution model for multi-step scene construction. However, adds dependency weight and abstraction overhead for a C++ engine that just needs a Python-side orchestrator.

CrewAI

CrewAI simplifies role-based agent collaboration (~35 lines for a minimal agent). Fastest time-to-value, ideal for rapid prototyping.

For LuckyEngine: The role-based model (planner agent, asset retriever agent, code generator agent) maps naturally to scene construction. But less control over execution flow than LangGraph.

AutoGen / Microsoft Agent Framework

AutoGen merged with Semantic Kernel into the unified Microsoft Agent Framework (GA Q1 2026). Multi-language support (C#, Python, Java) with deep Azure integration.

For LuckyEngine: The C# support is relevant since LuckyEngine uses C# scripting. But the Azure dependency conflicts with the local-first requirement.

DSPy

DSPy (Stanford) takes a fundamentally different approach: you declare input/output signatures, and DSPy compiles them into optimized prompts. Version 3.0 reports 10-40% quality improvement over manual prompting.

For LuckyEngine: Compelling for optimizing structured outputs (scene descriptions, asset queries). But less mature for complex agentic tool-calling loops.

Haystack

Haystack (deepset) is an open-source AI orchestration framework built RAG-first with 160+ document store integrations. Strong pipeline architecture with explicit data flow visibility.

For LuckyEngine: Excellent for the RAG/retrieval component of the asset catalog, but less suited as the primary agent orchestration layer.

Custom

Custom agent loops demand more upfront engineering but offer architectural clarity. Enterprises building mission-critical systems often go custom after outgrowing framework abstractions.

For LuckyEngine: Given the tight integration needed with a C++ engine, custom is likely the right choice for the core loop, potentially using framework components for specific subsystems (RAG, memory).

Framework Comparison Table

Framework	Tool Calling	Multi-Step Planning	Local/Ollama Support	Learning Curve	Production Maturity
LangGraph	Excellent	Excellent (graph-based)	Good	Medium	High (1.0 stable)
CrewAI	Good	Good (role-based)	Good	Low	Medium
AutoGen/MS	Good	Good	Poor (Azure-centric)	Medium	Medium
DSPy	Emerging	Emerging	Good	High	Medium
Haystack	Good	Medium	Good	Medium	High
Custom	You build it	You build it	Full control	High	You own it

3. Planning and Reasoning Architectures

Scene construction has a critical property: order matters. You cannot place a plate on a table that does not exist, or position furniture in a room without a floor. This makes planning architecture selection especially important.

ReAct (Reason + Act)

Operates in a continuous Thought → Action → Observation loop. The agent reasons about the current state, takes an action (tool call), observes the result, and repeats.

# ReAct-style scene construction (pseudocode)
while not scene_complete:
    thought = llm.reason(conversation_history + observations)
    action = llm.select_tool(thought)  # e.g., spawn_object, set_lighting
    observation = engine.execute(action)
    conversation_history.append(thought, action, observation)

Strengths: Adapts dynamically; if a table spawn fails, the agent can reason about alternatives. Simple to implement. Weaknesses: Inefficient for long tasks (repeated prompting overhead). A flawed early step can cascade. With 7-30B models, the reasoning quality per step may degrade over long chains.

For scene construction: Acceptable for simple scenes (< 10 steps), but unreliable for complex environments like a full kitchen setup that may require 30-50 ordered operations.

Plan-and-Execute

Two-phase approach: first generate a complete plan, then execute each step. Optionally replan when steps fail.

# Plan-and-Execute for scene construction
plan = planner_llm.create_plan(user_request)
# plan = ["1. Create room (6m x 4m)", "2. Add floor (wood texture)",
#         "3. Place dining table (center)", "4. Add 4 chairs around table", ...]

for step in plan:
    result = executor_llm.execute_step(step, available_tools, scene_state)
    if result.failed:
        plan = planner_llm.replan(plan, step, result.error, scene_state)

Strengths: Enforces explicit long-term planning. Different models can handle planning (bigger/smarter) vs. execution (smaller/faster). More token-efficient than ReAct. Enables human review of the plan before execution. Weaknesses: Less agile with dynamic changes. Requires explicit replanning logic.

For scene construction: This is the strongest fit. Scene planning naturally decomposes into ordered phases (room structure → large furniture → small objects → lighting → physics setup). The plan can be shown to the user for approval before execution begins.

Tree of Thoughts

Explores multiple reasoning paths, scoring and pruning candidates at each step.

For scene construction: Overkill for most scene generation tasks. The computational cost (multiple candidate evaluations per step) is prohibitive for local models. Could be useful for a specific sub-problem like optimal furniture layout, but not as the primary architecture.

ReWOO (Reasoning Without Observation)

Plans the entire tool-call sequence in one pass using variable placeholders, then executes all at once.

For scene construction: Attractive for token efficiency, but dangerous. Scene construction has dependencies (you need the table's position before placing objects on it), and without intermediate observation, the plan cannot adapt to runtime failures.

Recommended: Hierarchical Plan-and-Execute

The optimal architecture for scene generation combines Plan-and-Execute at the macro level with ReAct at the micro level:

# Hierarchical architecture for LuckyEngine
class SceneAgent:
    def build_scene(self, user_request: str):
        # Phase 1: High-level planning (use best available model)
        scene_plan = self.planner.decompose(user_request)
        # Returns: [Phase("Room Setup", [...]), Phase("Furniture", [...]),
        #           Phase("Props", [...]), Phase("Lighting", [...]),
        #           Phase("Physics", [...]), Phase("Scripts", [...])]

        # Phase 2: Execute each phase with ReAct for adaptability
        for phase in scene_plan:
            for step in phase.steps:
                result = self.executor.react_loop(
                    step,
                    tools=self.get_tools_for_phase(phase),
                    scene_state=self.engine.get_state(),
                    max_iterations=5
                )
                if result.failed:
                    # Replan remaining steps in this phase
                    phase.steps = self.planner.replan_phase(
                        phase, step, result.error
                    )

This gives you the reliability of upfront planning with the adaptability of ReAct within each phase.

4. RAG for 3D Asset Knowledge

The agent needs to know what 3D assets are available, their physical properties, compatible combinations, and spatial constraints. This is fundamentally different from text-document RAG.

The Problem with Pure Vector Search

Standard RAG embeds text chunks and retrieves by semantic similarity. But 3D assets have structured, relational properties:

A "dinner plate" has mass (0.3kg), friction coefficient (0.4), dimensions (26cm diameter), and is a MuJoCo body with specific collision geometry.
It must be placed on a surface (table, counter), not floating in air.
It is semantically related to "silverware," "napkin," "glass"—an entire place setting.
It has physics constraints: it needs a table with adequate surface area, the right friction to not slide off.

Vector search alone will find semantically similar assets but miss these structural relationships.

Recommended: Hybrid Retrieval Architecture

class AssetKnowledgeBase:
    """Hybrid retrieval combining structured catalog, vector search,
    and scene graph knowledge."""

    def __init__(self):
        # Layer 1: Structured catalog (SQLite - runs locally, no server)
        self.catalog = SQLiteCatalog("assets.db")
        # Schema: assets(id, name, category, subcategory, mass_kg,
        #   friction, dimensions_json, joints_json, mujoco_xml_path,
        #   mesh_path, tags, compatible_surfaces, grip_type)

        # Layer 2: Vector embeddings for semantic search
        self.vector_store = LocalVectorStore("assets.index")
        # Embeds: name + description + tags + category hierarchy

        # Layer 3: Scene graph templates (common arrangements)
        self.scene_graphs = SceneGraphDB("scene_templates.json")
        # e.g., "dinner_table_setting": {table -> [plate, glass, fork,
        #   knife, spoon, napkin], spatial_rules: [...]}

    def retrieve(self, query: str, scene_context: dict) -> list[Asset]:
        # Step 1: Vector search for semantic relevance
        candidates = self.vector_store.search(query, top_k=20)

        # Step 2: Filter by physics compatibility
        if scene_context.get("target_surface"):
            candidates = self.catalog.filter_by_surface_compatibility(
                candidates, scene_context["target_surface"]
            )

        # Step 3: Expand with scene graph relationships
        related = self.scene_graphs.get_related_assets(candidates)
        candidates.extend(related)

        # Step 4: Rank by relevance + physics feasibility
        return self.ranker.rank(candidates, query, scene_context)

Scene Graph Templates

For the "set up a dinner table" use case, pre-built scene graph templates are invaluable:

{
  "dinner_table_for_4": {
    "anchor": "dining_table",
    "children": [
      {"asset": "chair", "count": 4, "placement": "around_table", "spacing": "equal"},
      {"asset": "plate", "count": 4, "placement": "on_table", "arrangement": "place_setting"},
      {"asset": "glass", "count": 4, "placement": "on_table", "offset": "top_right_of_plate"},
      {"asset": "fork", "count": 4, "placement": "on_table", "offset": "left_of_plate"},
      {"asset": "knife", "count": 4, "placement": "on_table", "offset": "right_of_plate"}
    ],
    "lighting": {"type": "overhead", "warmth": "warm", "intensity": 0.7}
  }
}

Why Not a Full Knowledge Graph Database?

Systems like Neo4j are powerful but add operational complexity. For a local-first application with a bounded asset catalog (hundreds to low thousands of assets), SQLite + local vector index + JSON scene templates provides sufficient capability without requiring a graph database server.

5. Memory and Context Management

The Three Memory Tiers

Tier 1: Working Memory (Within-Session)

class WorkingMemory:
    conversation: list[Message]      # Current chat history
    scene_state: SceneSnapshot       # Live engine state
    active_plan: TaskList             # Current plan steps with status
    tool_results: list[ToolResult]   # Recent tool call results

Tier 2: Project Memory (Cross-Session, Per-Project)

class ProjectMemory:
    scene_history: list[SceneVersion]     # Previous scene states (undo/redo)
    user_preferences: dict                # "prefers warm lighting", "uses metric units"
    custom_assets: list[AssetRef]         # User-imported or generated assets
    script_templates: list[ScriptRef]     # Previously generated C# scripts
    correction_log: list[Correction]      # Learned sizing/style prefs

Tier 3: Global Memory (Cross-Project)

class GlobalMemory:
    style_preferences: dict          # Lighting style, color palettes
    frequently_used_assets: list     # Assets the user reaches for most
    workflow_patterns: list          # Plan-review-execute vs just-do-it?
    domain_expertise: str            # "robotics" vs "architecture"

Memory Compression for Small Models

With 7-30B parameter models, context window management is critical. Compress long conversations into salient facts:

def compress_memory(conversation: list[Message], llm) -> str:
    if len(conversation) < 10:
        return format_full(conversation)

    summary = llm.summarize(
        conversation,
        instruction="Extract: (1) what the user wants built, "
                    "(2) specific decisions made, (3) current scene state, "
                    "(4) remaining tasks. Be concise."
    )
    return summary

6. Code Generation Patterns

What Makes Cursor/Copilot Code Generation Work

The gap between Cursor and a raw LLM comes down to:

Codebase-aware context: The model sees relevant interfaces, base classes, and existing patterns—not just the current file.
Iterative refinement with feedback loops: Generate code → compile → errors flow back → fix. The user sees the first working version, not the first attempt.
Few-shot examples from the project: Including 2-3 existing scripts from the user's project as examples dramatically improves generation quality.

Sandboxing and Validation Pipeline

class CodeGenerationPipeline:
    def generate_and_validate(self, task: str, context: CodeContext) -> Script:
        for attempt in range(3):
            code = self.llm.generate_code(
                task=task,
                existing_scripts=context.similar_scripts[:3],
                api_reference=context.relevant_apis,
                engine_types=context.available_components
            )

            # Static validation
            syntax_errors = self.csharp_parser.check_syntax(code)
            if syntax_errors:
                code = self.llm.fix_errors(code, syntax_errors)
                continue

            # Compile check via engine
            compile_result = self.engine.try_compile(code)
            if compile_result.errors:
                code = self.llm.fix_errors(code, compile_result.errors)
                continue

            return Script(code=code, validated=True)

        raise CodeGenerationFailed("Could not generate valid code after 3 attempts")

Template-Based Generation

For common patterns, template-based generation with LLM-filled parameters is more reliable than free-form generation:

// Template: SpawnableObject.cs.template
public class {{ClassName}} : MonoBehaviour
{
    [Header("Physics Properties")]
    public float mass = {{mass}};
    public float friction = {{friction}};
    public bool useGravity = {{useGravity}};

    void Start()
    {
        {{#each setup_steps}}
        {{this}}
        {{/each}}
    }

    {{#if has_interaction}}
    public void OnInteract(Agent agent)
    {
        {{interaction_logic}}
    }
    {{/if}}
}

7. The "Cursor-Quality" Gap

What specifically makes Cursor feel magical compared to a raw LLM?

The Five Pillars

Automatic Context Retrieval — The user never manually specifies "look at file X." The agent just knows what's available. For LuckyEngine: the agent must automatically know available assets, scene state, APIs, and existing scripts.
Incremental, Fast Indexing — Re-indexing after changes takes seconds. For LuckyEngine: scene state changes must be reflected in agent context immediately.
Multi-File Awareness — Understanding cross-file dependencies. For LuckyEngine: the agent must understand the full scene graph as an interconnected system.
Iterative Refinement with Verification — Generate, verify, fix, then present. The user sees the first working version. For LuckyEngine: compile scripts, verify object spawns, confirm physics—before reporting success.
Persistent Learning — Project-level configuration teaches the model conventions over time. For LuckyEngine: the memory system learns user preferences across sessions.

8. Recommended Architecture for LuckyEngine

High-Level Architecture

+-------------------------------------------------------------------+
|                        LuckyEngine (C++)                          |
|  +-------------------------------------------------------------+ |
|  |  Engine API (exposed via gRPC/IPC to Python agent)           | |
|  |  - spawn_object(asset_id, position, rotation, scale)         | |
|  |  - set_physics(object_id, mass, friction, restitution)       | |
|  |  - create_joint(body_a, body_b, joint_type, params)          | |
|  |  - attach_script(object_id, script_code)                     | |
|  |  - set_lighting(type, position, intensity, color, warmth)    | |
|  |  - get_scene_state() -> SceneSnapshot                        | |
|  |  - compile_script(code) -> CompileResult                     | |
|  +-------------------------------------------------------------+ |
+-------------------------------------------------------------------+
         |  gRPC / IPC
         v
+-------------------------------------------------------------------+
|                    Agent Orchestrator (Python)                     |
|                                                                   |
|  +------------------+  +------------------+  +-----------------+  |
|  | Intent Parser    |  | Scene Planner    |  | Executor        |  |
|  | (understands     |  | (Plan-and-Execute|  | (ReAct loop per |  |
|  |  user requests)  |  |  with phases)    |  |  plan step)     |  |
|  +------------------+  +------------------+  +-----------------+  |
|           |                     |                     |           |
|  +------------------+  +------------------+  +-----------------+  |
|  | Asset Retriever  |  | Code Generator  |  | Memory Manager  |  |
|  | (hybrid RAG)     |  | (template-first)|  | (3-tier)        |  |
|  +------------------+  +------------------+  +-----------------+  |
|           |                     |                     |           |
|  +------------------+  +------------------+  +-----------------+  |
|  | SQLite + Vector  |  | Script Templates|  | Project JSON +  |  |
|  | Index (local)    |  | + C# Validator  |  | Global Config   |  |
|  +------------------+  +------------------+  +-----------------+  |
|                                                                   |
|  +-------------------------------------------------------------+ |
|  |  LLM Interface (Ollama / OpenAI-compatible API)              | |
|  |  Primary: Qwen3-30B (MoE, fits M4 Max / RTX 4090)           | |
|  |  Fallback: Qwen3.5 (9.7B, faster)                           | |
|  +-------------------------------------------------------------+ |
+-------------------------------------------------------------------+

Component Details

1. Agent Loop: Custom Single-Threaded Master Loop

Follow Claude Code's architecture. Do not use LangGraph or CrewAI for the core loop.

class LuckyAgent:
    def __init__(self, engine, llm, assets, memory):
        self.engine = engine
        self.llm = llm
        self.assets = assets
        self.memory = memory
        self.tools = self._register_tools()

    def handle_request(self, user_message: str):
        # 1. Load context
        context = self._build_context(user_message)

        # 2. Intent classification + planning
        plan = self._create_plan(user_message, context)

        # 3. Show plan to user for approval
        yield PlanPreview(plan)

        # 4. Execute plan phases
        for phase in plan.phases:
            yield PhaseStart(phase)
            for step in phase.steps:
                result = self._execute_step(step, max_iterations=5)
                yield StepResult(step, result)
                if result.failed:
                    revised = self._replan_phase(phase, step, result)
                    phase.steps = revised

        # 5. Update memory
        self.memory.record_session(user_message, plan, results)

    def _execute_step(self, step, max_iterations=5):
        """ReAct loop for a single plan step."""
        for i in range(max_iterations):
            response = self.llm.chat(
                messages=self._format_step_prompt(step),
                tools=self.tools
            )
            if response.tool_calls:
                for call in response.tool_calls:
                    result = self._execute_tool(call)
                    step.observations.append(result)
                    if result.success and step.is_complete(result):
                        return StepResult(success=True)
            else:
                return StepResult(success=True, message=response.text)
        return StepResult(success=False, error="Max iterations reached")

2. Model Strategy: Tiered Approach

class ModelRouter:
    def __init__(self):
        self.planner = OllamaModel("qwen3:30b")       # Best reasoning
        self.executor = OllamaModel("qwen3.5:latest")  # Fast execution
        self.embedder = OllamaModel("nomic-embed-text") # Local embeddings

    def route(self, task_type: str) -> OllamaModel:
        if task_type in ("plan", "replan", "complex_code_gen"):
            return self.planner
        elif task_type in ("simple_tool_call", "parameter_fill"):
            return self.executor
        elif task_type == "embed":
            return self.embedder

3. Asset Knowledge: SQLite + FAISS + Scene Templates

No external vector database needed. FAISS handles similarity search over thousands of assets with sub-millisecond latency locally.

4. Tool Definitions

TOOLS = [
    Tool("spawn_object", "Place a 3D object in the scene",
         params={"asset_id": str, "position": Vec3, "rotation": Vec3, "scale": Vec3}),
    Tool("remove_object", "Remove an object from the scene",
         params={"object_id": str}),
    Tool("set_physics", "Configure physics properties",
         params={"object_id": str, "mass": float, "friction": float,
                 "restitution": float, "is_static": bool}),
    Tool("create_joint", "Create a physics joint between two bodies",
         params={"body_a": str, "body_b": str, "joint_type": str, "params": dict}),
    Tool("set_lighting", "Add or modify a light",
         params={"light_type": str, "position": Vec3, "intensity": float,
                 "color": Vec3, "warmth": float}),
    Tool("attach_script", "Attach a C# behavior script to an object",
         params={"object_id": str, "script_code": str}),
    Tool("search_assets", "Search asset catalog by description",
         params={"query": str, "category": str, "max_results": int}),
    Tool("get_scene_state", "Get current state of all objects",
         params={}),
    Tool("compile_check", "Check if C# code compiles",
         params={"code": str}),
]

5. Context Engineering: The Secret Sauce

def build_system_prompt(scene_state, available_tools, user_prefs, phase):
    return f"""You are a 3D scene construction agent embedded in LuckyEngine.

CURRENT SCENE STATE:
{format_scene_state(scene_state)}

AVAILABLE ASSETS (most relevant to current task):
{format_asset_summary(retrieved_assets)}

CURRENT PHASE: {phase}
REMAINING PLAN STEPS:
{format_remaining_steps(plan)}

USER PREFERENCES:
{format_preferences(user_prefs)}

RULES:
- Always check scene state before placing objects to avoid collisions
- Place surfaces (floors, tables) before objects that rest on them
- Set physics properties immediately after spawning physics-enabled objects
- Use search_assets before spawning to find the best matching asset
- If a step fails, explain why and suggest an alternative
"""

What NOT to Build

Do not build a multi-model compound system like Devin — With local models, you don't have the luxury of specialized planner/coder/critic models. Use one good model with different prompting strategies.
Do not use a cloud vector database — FAISS + SQLite handles your scale locally with zero latency.
Do not implement Tree of Thoughts — Computational cost prohibitive for local models.
Do not add LangChain/LangGraph as a dependency — Your agent loop is 200 lines of Python. Keep it that way.
Do not fine-tune models initially — 57% of production agent deployments don't fine-tune. Start with prompt engineering + RAG + structured tools.

Implementation Roadmap

Phase	What	Why
1	Custom agent loop + basic tools	Get core loop working with Ollama
2	Asset catalog (SQLite + FAISS)	Agent can search and retrieve assets
3	Plan-and-Execute planner	Multi-step scene construction
4	C# code generation pipeline	Template-first with validation
5	Project memory (JSON persistence)	Cross-session context
6	Scene graph templates	Common scene patterns (kitchen, workshop, etc.)
7	Context engineering refinement	Optimize prompts, add few-shot examples

Conclusion

The path to a "Cursor-quality" AI agent for 3D scene generation is not through framework adoption—it is through disciplined context engineering. Cursor's 12.5% accuracy improvement from semantic indexing, Replit's multi-agent architecture, and Claude Code's elegant single-threaded master loop all point to the same lesson: the model is the commodity; the retrieval, planning, and tool integration are the product.

For LuckyEngine, the recommended architecture is deliberately simple: a custom Python agent loop with Plan-and-Execute reasoning, a hybrid asset retrieval system (SQLite + FAISS + scene graph templates), template-first code generation with iterative validation, and three-tier memory management. All of it runs locally on Qwen3-30B via Ollama.

The most important investment is not in the agent framework but in two areas: (1) building a rich, physics-aware asset catalog with scene graph templates that encode spatial common sense, and (2) engineering the context so that a 30B-parameter model has everything it needs—scene state, relevant assets, API reference, user preferences—in its prompt at the moment of each decision.

Build simple. Retrieve well. Plan explicitly. Execute with feedback. That is how you get Cursor-quality from a local model in a 3D engine.

References

How Cursor Actually Indexes Your Codebase — Towards Data Science
Cursor Codebase Indexing Documentation
Devin: Coding Agents 101 — Cognition AI
Replit Agent Case Study — LangChain
State of Agent Engineering — LangChain
LangChain and LangGraph 1.0
DSPy Framework
Qwen3 Blog Post
ConceptGraphs: Open-Vocabulary 3D Scene Graphs
Mem0 AI Memory Layer

Model	ID	Params	Quant
Qwen 3.5	`qwen3.5:latest`	9.7B	Q4_K_M
Qwen 3 30B (MoE)	`qwen3:30b`	30.5B	Q4_K_M

devrim/agent-architecture-research.md