AI Scene Agent for LuckyEngine — Technical Report

Date: 2026-04-15 Scope: Evaluate approaches for an AI agent that creates interactive training scenes for robots in LuckyEngine

1. Problem Statement

LuckyEngine is a C++ physics simulator (Vulkan rendering, MuJoCo physics, entt ECS) with a Python SDK (LuckyRobots) and RL/IL framework (LuckyLab). Users need to create diverse, physically-plausible training environments for robots — kitchens, warehouses, obstacle courses, manipulation tasks, etc.

Today this is done manually in the ImGui-based editor. We want an AI agent that can:

Generate complete scenes from natural language ("a cluttered kitchen with a Panda arm on a counter")
Modify existing scenes ("add three random obstacles to the floor", "swap the table for a shelf")
Configure training parameters (domain randomization, reward functions, curriculum stages)
Understand physical constraints (stable placements, reachable workspaces, collision-free paths)

2. Existing Infrastructure to Build On

2.1 Scene Format

YAML-based .hscene files via SceneSerializer
Entities have UUID handles, hierarchical parent/child relationships
47+ component types including MuJoCo-specific colliders and robot controllers

2.2 gRPC API (7 services, already live)

Service	Key RPCs
`SceneService`	`GetSceneInfo`, `ListEntities`, `GetEntity`, `SetEntityTransform`, `SetSimulationMode`
`AgentService`	`Step`, `GetObservation`, `ResetAgent`, `GetAgentSchema`
`MujocoService`	`GetJointState`, `SendControl`, `GetMujocoInfo`
`CameraService`	`ListCameras`, `StreamCamera`
`ViewportService`	`StreamViewport`
`TelemetryService`	`StreamTelemetry`
`DebugService`	`Draw` (lines, arrows, velocity commands)

2.3 Python SDK (`LuckyRobots`)

Session (high-level lifecycle) and LuckyEngineClient (low-level gRPC)
SimulationContract for domain randomization (friction, mass, motor, terrain, etc.)
Already supports list_entities, set_entity_transform, get_scene_info

2.4 Asset System

AssetRegistry (.hzr YAML) maps UUIDs to file paths and types
24+ asset types: MeshSource, StaticMesh, Material, Texture, MujocoScene, Prefab, Script, etc.
ContentVault/ stores robots (Panda, etc.), example scenes, models, materials

2.5 Prefab System

Prefab::Create(Entity) → save entity subtree as reusable asset
PrefabManager::InstantiatePrefab(handle, transform, parent) → spawn into scene
Supports propagation and reversion

2.6 What's Missing for Scene Creation

The existing gRPC API is read-heavy and control-focused — designed for RL training loops, not scene authoring. There is no RPC for:

Creating/deleting entities
Adding/removing components
Spawning prefabs or MuJoCo models
Modifying physics properties
Creating/saving scenes
Querying the asset registry

These gaps define the work needed.

3. Solution Approaches

3.1 Option A: MCP Server (Recommended Starting Point)

Architecture:

User ↔ Claude (or any MCP client)
         ↓ MCP protocol (stdio or SSE)
    MCP Server (Python)
         ↓ gRPC + direct file I/O
    LuckyEngine (running instance)

What is MCP? Model Context Protocol — an open standard that lets AI models call tools on external systems. Claude, GPT, and other models support it natively. The MCP server exposes "tools" (functions the model can call) and "resources" (data the model can read).

Implementation: A Python process that:

Connects to a running LuckyEngine instance via gRPC (using existing LuckyEngineClient)
Extends the gRPC API where needed (new RPCs for entity creation, component manipulation)
Exposes scene-authoring tools via MCP protocol
Optionally reads/writes .hscene YAML files directly for offline scene generation

Proposed MCP Tool Surface (28 tools):

Scene Management

Tool	Description
`create_scene`	Create a new empty scene with name and physics settings
`load_scene`	Load an existing `.hscene` file
`save_scene`	Save current scene to disk
`get_scene_info`	Get scene metadata, entity count, physics settings
`set_scene_settings`	Configure time mode, physics substeps, MuJoCo options

Entity CRUD

Tool	Description
`create_entity`	Create entity with name, transform, optional parent
`delete_entity`	Remove entity and its children
`duplicate_entity`	Clone entity subtree with new UUIDs
`list_entities`	List all entities with optional component filters
`get_entity`	Get full entity info (transform, components, children)
`set_transform`	Set entity position, rotation, scale
`set_parent`	Reparent entity in hierarchy

Component Operations

Tool	Description
`add_component`	Add component to entity (type + properties JSON)
`remove_component`	Remove component from entity
`update_component`	Modify component properties
`list_component_types`	List available component types and their schemas

Asset Operations

Tool	Description
`search_assets`	Search asset registry by name, type, or path pattern
`get_asset_info`	Get asset metadata (type, path, dependencies)
`instantiate_prefab`	Spawn a prefab at a given transform
`load_mujoco_model`	Load a MJCF/URDF robot into the scene

Physics & Simulation

Tool	Description
`set_simulation_mode`	Switch between Realtime / Deterministic / Fast
`configure_domain_randomization`	Set SimulationContract parameters
`add_collider`	Add box/sphere/capsule/mesh collider to entity
`set_physics_properties`	Set mass, friction, restitution on a body

Spatial Reasoning

Tool	Description
`raycast`	Cast ray and return hit info (for placement validation)
`overlap_query`	Check for overlapping entities in a region
`get_bounds`	Get AABB of an entity or subtree

Observation

Tool	Description
`capture_viewport`	Take a screenshot of the current viewport (for visual verification)

Example Interaction:

User: "Create a kitchen with a Panda robot arm on the counter, 
       with 5 random mugs it needs to pick up"

Claude thinks:
1. create_entity("Kitchen", transform={0,0,0})
2. instantiate_prefab("kitchen_counter", parent="Kitchen", transform={0,0,0})
3. load_mujoco_model("Panda", parent="Kitchen", transform={0,0.9,0})  # counter height
4. search_assets(type="Prefab", query="mug")  # find mug prefab
5. For i in 1..5:
   - instantiate_prefab("mug", transform=random_on_counter())
   - set_physics_properties(entity, mass=0.3, friction=0.6)
6. capture_viewport()  # verify visually
7. save_scene("kitchen_training.hscene")

Implementation Effort: ~2-3 weeks

Week 1: Extend gRPC API with entity/component creation RPCs
Week 2: Build MCP server wrapping gRPC client + tool definitions
Week 3: Testing, iteration, prompt engineering for scene quality

Pros:

Leverages Claude's spatial reasoning and common-sense knowledge
No training data needed
Works immediately with any MCP-compatible model
Users can refine scenes conversationally
Natural language → scene in seconds
Easy to add new tools as needs emerge

Cons:

Requires API calls to a frontier model (cost per scene)
Latency: multi-turn generation takes 10-30s
Quality depends on model's understanding of physical plausibility
Needs a running engine instance (or offline YAML generation mode)

3.2 Option B: Fine-Tuned Scene Generation Model

Architecture:

User prompt → Fine-tuned 7B-13B model → Scene YAML → LuckyEngine loads it

Approach:

Define a scene DSL — a simplified YAML schema that maps 1:1 to .hscene format but omits UUIDs and internal details
Generate training data: 5,000-50,000 pairs of (description, scene DSL)
- Bootstrap from existing scenes + GPT-4 augmentation
- Procedural generation of scene variations with back-translated descriptions
Fine-tune Llama 3.1 8B or Mistral 7B on (prompt → scene DSL)
Post-process: validate schema, assign UUIDs, resolve asset references, compile to .hscene
Deploy on your servers via vLLM or TGI

Scene DSL Example:

scene: kitchen_training
room:
  type: kitchen
  dimensions: [4.0, 3.0, 2.8]  # meters
  
objects:
  - type: counter
    position: [0, 0, 0]
    dimensions: [2.0, 0.9, 0.6]
    material: granite
    
  - type: robot
    model: panda
    mount: counter
    position: [1.0, 0.9, 0.3]
    
  - type: mug
    count: 5
    placement: random_on_surface(counter)
    physics:
      mass: [0.2, 0.4]  # uniform range
      friction: 0.6

training:
  domain_randomization:
    friction: [0.3, 1.0]
    mass_scale: [0.8, 1.2]
  curriculum:
    - stage: 1
      mug_count: 1
    - stage: 2
      mug_count: 3
    - stage: 3
      mug_count: 5

Data Generation Strategy:

Seed scenes: Export all existing ContentVault scenes to DSL format
Augmentation: Use Claude/GPT-4 to generate 100 variations per seed scene
Procedural: Write generators for common environments (kitchens, warehouses, tables, shelves)
Back-translation: For each generated scene, ask a model to write 5 different natural language descriptions
Validation: Filter training data through a physics validator (stable placements, no interpenetration)

Model Choices:

Model	Size	VRAM	Inference Speed	Quality
Phi-3 Mini	3.8B	8GB	~50 tok/s (A100)	Good for templated scenes
Llama 3.1 8B	8B	16GB	~30 tok/s (A100)	Good general quality
Mistral 7B	7B	14GB	~35 tok/s (A100)	Strong structured output
Llama 3.1 70B	70B	140GB	~8 tok/s (A100)	Near-frontier quality
CodeLlama 13B	13B	26GB	~20 tok/s (A100)	Strong YAML/code generation

Recommended: Start with Llama 3.1 8B — good balance of quality, speed, and resource requirements. Fine-tune with QLoRA for efficiency.

Implementation Effort: ~6-10 weeks

Weeks 1-2: Define scene DSL, build DSL → .hscene compiler
Weeks 3-5: Generate and validate training data (5,000+ examples)
Weeks 6-7: Fine-tune model, evaluate, iterate
Weeks 8-9: Build serving infrastructure (vLLM, API, validation pipeline)
Week 10: Integration testing with engine

Pros:

Fast inference (~1-3s per scene)
Runs entirely on your infrastructure (no API costs)
Works offline / air-gapped
Embeddable in your product as a feature
Deterministic with temperature=0

Cons:

Large upfront investment (data + training)
Limited to patterns seen in training data
Poor at novel/complex requests
Requires ongoing data curation as engine evolves
Output needs validation and post-processing

3.3 Option C: Hybrid Architecture (Best of Both)

Architecture:

User prompt
    ↓
Router (complexity classifier)
    ├── Simple/templated → Fine-tuned small model → Scene DSL → Engine
    └── Complex/novel → Claude via MCP → Engine (tool-by-tool)

Router Logic (could be a small classifier or heuristic):

Simple path (small model): "a room with a table", "empty warehouse", "standard pick-and-place setup"
Complex path (MCP/Claude): "recreate the IKEA Lack table assembly task with proper joint constraints", "a kitchen where the fridge door swings open and mugs are inside", multi-step reasoning needed

Implementation Effort: ~8-12 weeks (MCP first, then model, then router)

Pros:

Fast for common cases, smart for hard cases
Graceful degradation (if model fails, fall back to MCP)
Cost-efficient (small model handles 80% of requests)

Cons:

Most infrastructure to maintain
Router needs tuning to avoid misclassification
Two code paths to keep in sync with engine changes

3.4 Option D: Claude Agent SDK

Architecture:

User ↔ Agent SDK orchestrator
         ↓ (tool calls)
    LuckyEngine gRPC + Asset Registry + Scene Files

What it is: Rather than MCP (which requires an external MCP client like Claude Desktop), build a standalone Python agent using the Claude Agent SDK. The agent has:

A system prompt encoding LuckyEngine's physics constraints, component schemas, and asset catalog
Tools that directly call gRPC or manipulate scene files
Multi-turn reasoning with memory across the scene-building process

Differences from MCP:

Self-contained: runs as a Python script or service, no MCP client needed
Can embed custom validation logic between tool calls
Can maintain state across turns (asset cache, placement history, collision map)
Can be triggered from your editor UI, CLI, or web interface

Implementation Effort: ~3-4 weeks (similar to MCP, but the orchestration layer is custom instead of relying on MCP protocol)

Pros:

Full control over agent behavior
Can embed in your editor as a panel
Custom validation between steps
No dependency on MCP ecosystem

Cons:

Locked to Claude API (vs MCP which is model-agnostic)
More custom code to maintain than MCP
Same API cost considerations as MCP

4. Detailed Comparison

Criterion	MCP Server	Fine-Tuned Model	Hybrid	Agent SDK
Time to first demo	1-2 weeks	6-8 weeks	8-10 weeks	2-3 weeks
Time to production	3-4 weeks	10-12 weeks	12-14 weeks	4-5 weeks
Scene quality	High (frontier reasoning)	Medium (training-limited)	High	High
Novel scene handling	Excellent	Poor	Good	Excellent
Inference speed	10-30s	1-3s	1-30s	10-30s
Per-scene cost	~$0.05-0.50	~$0.001	~$0.01-0.10	~$0.05-0.50
Offline/air-gapped	No	Yes	Partial	No
Maintenance burden	Low	High (data + model)	High	Medium
Embeddable in product	Via MCP clients	Yes	Yes	Yes (custom UI)
Model-agnostic	Yes (MCP standard)	Yes (your model)	Partial	No (Claude only)

5. Recommendation

Phase 1: MCP Server (Now → 4 weeks)

Start here. The ROI is highest: minimal investment, maximum capability, validates the concept.

Concrete steps:

Extend gRPC API (1 week)
- Add CreateEntity, DeleteEntity, DuplicateEntity RPCs to SceneService
- Add AddComponent, RemoveComponent, UpdateComponent RPCs
- Add SearchAssets, InstantiatePrefab RPCs
- Add CaptureViewport RPC (return PNG bytes)
- Update .proto files in Hazel/vendor/luckyrobots/src/luckyrobots/grpc/proto/
- Implement server-side handlers in C++ gRPC server
Build MCP Server (1 week)
- Python package using mcp SDK
- Wraps LuckyEngineClient with tool definitions
- Adds spatial reasoning helpers (random placement on surface, grid layouts, etc.)
- Schema validation for component property JSON
Prompt Engineering & Testing (1 week)
- System prompt with engine constraints, component schemas, asset catalog
- Test with 20+ scene generation scenarios
- Iterate on tool granularity (too fine = slow, too coarse = inflexible)
Editor Integration (1 week, optional)
- Chat panel in ImGui editor that sends messages to MCP server
- Or: CLI tool (lucky-scene create "a kitchen with a Panda arm")
- Or: integrate with Claude Desktop via MCP config

Phase 2: Evaluate & Decide (4-8 weeks out)

After Phase 1, you'll know:

What types of scenes users actually request
Where the frontier model excels vs. struggles
Whether latency/cost is acceptable for your use case
What the actual tool surface needs to be

If latency or cost is a problem → proceed to fine-tuned model (Option B) If it's working well → invest in better tools, spatial reasoning, and asset coverage

Phase 3: Hybrid (Optional, 8+ weeks out)

If usage data shows 70%+ of requests are templated/simple, fine-tune a small model for the fast path and keep MCP for complex cases.

6. Technical Deep-Dive: MCP Server Design

6.1 Server Structure

lucky-scene-mcp/
├── pyproject.toml
├── src/
│   └── lucky_scene_mcp/
│       ├── __init__.py
│       ├── server.py              # MCP server entry point
│       ├── tools/
│       │   ├── scene.py           # Scene management tools
│       │   ├── entity.py          # Entity CRUD tools
│       │   ├── component.py       # Component manipulation tools
│       │   ├── asset.py           # Asset search and instantiation
│       │   ├── physics.py         # Physics configuration tools
│       │   └── spatial.py         # Spatial reasoning helpers
│       ├── resources/
│       │   ├── asset_catalog.py   # Browsable asset registry
│       │   ├── component_schemas.py  # Component type schemas
│       │   └── scene_templates.py # Pre-built scene templates
│       ├── engine_client.py       # Wrapper around LuckyEngineClient
│       └── validation.py          # Physics plausibility checks

6.2 Key Design Decisions

Tool granularity: Medium. Don't expose raw component fields as individual tools (too slow — 50 tool calls per entity). Don't make one monolithic "create_scene_from_json" tool (defeats the purpose of iterative reasoning). Sweet spot: entity-level operations with component bundles.

# Good: entity-level with component bundle
create_entity(
    name="Kitchen Counter",
    transform={"position": [0, 0, 0], "scale": [2, 0.9, 0.6]},
    components={
        "StaticMesh": {"asset": "mesh://counter_01"},
        "BoxCollider": {"size": [2, 0.9, 0.6]},
        "RigidBody": {"type": "Static"}
    }
)

# Too fine: separate calls per component
create_entity(name="Kitchen Counter")
set_transform(entity, position=[0,0,0])
add_mesh(entity, asset="counter_01")
add_collider(entity, type="box", size=[2,0.9,0.6])
add_rigidbody(entity, type="Static")

Asset resolution: The model shouldn't need to know UUIDs. Provide fuzzy search:

search_assets(query="mug", type="Prefab")
# Returns: [{"name": "CeramicMug_01", "handle": "...", "path": "ContentVault/Props/Mugs/..."}]

Spatial helpers: The model is bad at precise 3D math. Provide helpers:

place_on_surface(entity, surface_entity, offset=[0, 0, 0])  # snap to top of surface
random_placement(entity, bounds={"min": [-1,0,-1], "max": [1,0,1]}, surface=counter)
grid_layout(prefab, rows=3, cols=3, spacing=0.3, origin=[0, 0.9, 0])

Validation: After each placement, optionally run:

check_placement(entity)
# Returns: {"stable": true, "collisions": [], "reachable_by": ["Panda"]}

6.3 MCP Resources (Read-Only Context)

Resources let the model browse information without tool calls:

Resource URI	Content
`scene://current/info`	Current scene metadata
`scene://current/entities`	Entity tree
`assets://registry`	Full asset catalog
`assets://prefabs`	Available prefabs
`assets://robots`	Available robot models
`schema://components`	Component type definitions and property schemas
`templates://scenes`	Pre-built scene templates

6.4 System Prompt Strategy

The MCP server should include a system prompt (via MCP instructions) that encodes:

Physics constraints: "Objects must be placed on surfaces or have RigidBody components. A mug on a counter needs Y position = counter_height + mug_half_height."
Component compatibility: "MuJoCo bodies cannot coexist with Jolt RigidBody on the same entity. Use MujocoBoxCollider for MuJoCo scenes."
Asset conventions: "Robot models are in ContentVault/Robots/. Props are in ContentVault/Props/. Use search_assets to discover available assets."
Scene structure: "Always create a root entity for the environment. Parent all scene objects to it. Place robots at workspace-appropriate heights."
Coordinate system: "Y-up, meters. Typical room height: 2.4-3.0m. Counter height: 0.9m. Table height: 0.75m."

7. New gRPC RPCs Needed

These RPCs need to be added to the engine's gRPC server to support the MCP tool surface:

`scene.proto` additions

// Entity creation & manipulation
rpc CreateEntity(CreateEntityRequest) returns (CreateEntityResponse);
rpc DeleteEntity(DeleteEntityRequest) returns (DeleteEntityResponse);
rpc DuplicateEntity(DuplicateEntityRequest) returns (DuplicateEntityResponse);
rpc SetEntityParent(SetEntityParentRequest) returns (SetEntityParentResponse);

// Component manipulation
rpc AddComponent(AddComponentRequest) returns (AddComponentResponse);
rpc RemoveComponent(RemoveComponentRequest) returns (RemoveComponentResponse);
rpc UpdateComponent(UpdateComponentRequest) returns (UpdateComponentResponse);
rpc GetComponents(GetComponentsRequest) returns (GetComponentsResponse);

// Scene lifecycle
rpc CreateScene(CreateSceneRequest) returns (CreateSceneResponse);
rpc SaveScene(SaveSceneRequest) returns (SaveSceneResponse);
rpc SetSceneSettings(SetSceneSettingsRequest) returns (SetSceneSettingsResponse);

message CreateEntityRequest {
  string name = 1;
  Transform transform = 2;
  optional EntityId parent = 3;
  map<string, google.protobuf.Struct> components = 4;  // component_type → properties
}

message AddComponentRequest {
  EntityId entity = 1;
  string component_type = 2;
  google.protobuf.Struct properties = 3;
}

New `asset.proto` service

service AssetService {
  rpc SearchAssets(SearchAssetsRequest) returns (SearchAssetsResponse);
  rpc GetAssetInfo(GetAssetInfoRequest) returns (GetAssetInfoResponse);
  rpc InstantiatePrefab(InstantiatePrefabRequest) returns (InstantiatePrefabResponse);
  rpc LoadMujocoModel(LoadMujocoModelRequest) returns (LoadMujocoModelResponse);
  rpc ListAssetTypes(google.protobuf.Empty) returns (ListAssetTypesResponse);
}

message SearchAssetsRequest {
  optional string query = 1;      // fuzzy name search
  optional string type = 2;       // filter by asset type
  optional string path_glob = 3;  // filter by path pattern
  uint32 max_results = 4;
}

message InstantiatePrefabRequest {
  uint64 asset_handle = 1;
  Transform transform = 2;
  optional EntityId parent = 3;
}

New `spatial.proto` service

service SpatialService {
  rpc Raycast(RaycastRequest) returns (RaycastResponse);
  rpc OverlapBox(OverlapBoxRequest) returns (OverlapResponse);
  rpc OverlapSphere(OverlapSphereRequest) returns (OverlapResponse);
  rpc GetEntityBounds(GetEntityBoundsRequest) returns (BoundsResponse);
  rpc CaptureViewport(CaptureViewportRequest) returns (CaptureViewportResponse);
}

8. Alternative/Complementary Technologies

8.1 Existing Scene Generation Tools

Tool	What it does	Relevance
NVIDIA Isaac Sim + Replicator	Procedural scene generation for robot training via Python API	Closest competitor; shows the market wants this. Tightly coupled to Omniverse.
AI2-THOR / ProcTHOR	Procedural house generation for embodied AI	Open-source procedural layouts. Could borrow algorithms.
Habitat	Meta's embodied AI simulator with scene datasets	Dataset-driven rather than generative. Different approach.
SceneDiffusion / LayoutGPT	Research models for 3D scene layout from text	Academic; not production-ready. Shows LLMs can do spatial layout.
3D-GPT	LLM-based procedural 3D generation	Targets Blender; concept is transferable.

8.2 Asset Generation (Complementary)

Tool	What it does	Integration path
Meshy / Tripo3D	Text/image → 3D mesh (GLB)	Generate custom props, import as MeshSource
Rodin (Hyper3D)	High-quality text → 3D model	Higher quality meshes for training scenes
OpenUSD/MaterialX	Standard material descriptions	Could standardize material properties

These are complementary — they solve "I need a mug model" while your agent solves "I need a kitchen scene with mugs placed realistically."

9. Cost Projections

MCP approach (Claude API)

Scenario	Input tokens	Output tokens	Cost/scene
Simple scene (5 entities)	~3,000	~2,000	~$0.05
Medium scene (20 entities)	~10,000	~8,000	~$0.20
Complex scene (50+ entities, multi-turn)	~30,000	~20,000	~$0.60
Batch generation (100 training variants)	~500,000	~300,000	~$10.00

Based on Claude Sonnet pricing. Using Haiku for simple scenes would be ~10x cheaper.

Fine-tuned model (self-hosted)

Setup	Hardware	Monthly cost	Per-scene cost
Llama 8B on 1x A100	Cloud GPU	~$2,000/mo	~$0.001
Llama 8B on 1x L40S	Cloud GPU	~$1,200/mo	~$0.001
Llama 8B on local RTX 4090	One-time $2K	~$50/mo (power)	~$0.0005

Break-even: if generating >5,000 scenes/month, self-hosted becomes cheaper.

10. Summary & Next Steps

Start with an MCP server. It's the fastest path to a working demo, requires no training data, and validates the concept with minimal risk. The gRPC API extensions you build for MCP will also benefit the Python SDK and any future approach.

Immediate actions:

Design the entity/component creation RPCs — extend scene.proto and asset.proto
Implement C++ gRPC handlers for entity creation, component manipulation, prefab instantiation
Build the MCP server in Python, wrapping LuckyEngineClient
Test with Claude Desktop or Claude Code as the MCP client
Iterate on tool design based on real scene generation attempts

Open questions to resolve:

Should scene generation work offline (YAML file generation) or require a running engine?
What asset library will be available? (The ContentVault needs props, furniture, obstacles)
Should the agent also configure training parameters (rewards, curriculum) or just scene geometry?
What's the target user? (Robotics researchers vs. game designers vs. automated pipelines)

devrim/ai-scene-agent-technical-report.md

AI Scene Agent for LuckyEngine — Technical Report

1. Problem Statement

2. Existing Infrastructure to Build On

2.1 Scene Format

2.2 gRPC API (7 services, already live)

2.3 Python SDK (LuckyRobots)

2.4 Asset System

2.5 Prefab System

2.6 What's Missing for Scene Creation

3. Solution Approaches

3.1 Option A: MCP Server (Recommended Starting Point)

Scene Management

Entity CRUD

Component Operations

Asset Operations

Physics & Simulation

Spatial Reasoning

Observation

3.2 Option B: Fine-Tuned Scene Generation Model

3.3 Option C: Hybrid Architecture (Best of Both)

3.4 Option D: Claude Agent SDK

4. Detailed Comparison

5. Recommendation

Phase 1: MCP Server (Now → 4 weeks)

Phase 2: Evaluate & Decide (4-8 weeks out)

Phase 3: Hybrid (Optional, 8+ weeks out)

6. Technical Deep-Dive: MCP Server Design

6.1 Server Structure

6.2 Key Design Decisions

6.3 MCP Resources (Read-Only Context)

6.4 System Prompt Strategy

7. New gRPC RPCs Needed

scene.proto additions

New asset.proto service

New spatial.proto service

8. Alternative/Complementary Technologies

8.1 Existing Scene Generation Tools

8.2 Asset Generation (Complementary)

9. Cost Projections

MCP approach (Claude API)

Fine-tuned model (self-hosted)

10. Summary & Next Steps

Immediate actions:

Open questions to resolve:

2.3 Python SDK (`LuckyRobots`)

`scene.proto` additions

New `asset.proto` service

New `spatial.proto` service