Date: 2026-04-15 Scope: Evaluate approaches for an AI agent that creates interactive training scenes for robots in LuckyEngine
LuckyEngine is a C++ physics simulator (Vulkan rendering, MuJoCo physics, entt ECS) with a Python SDK (LuckyRobots) and RL/IL framework (LuckyLab). Users need to create diverse, physically-plausible training environments for robots — kitchens, warehouses, obstacle courses, manipulation tasks, etc.
Today this is done manually in the ImGui-based editor. We want an AI agent that can:
- Generate complete scenes from natural language ("a cluttered kitchen with a Panda arm on a counter")
- Modify existing scenes ("add three random obstacles to the floor", "swap the table for a shelf")
- Configure training parameters (domain randomization, reward functions, curriculum stages)
- Understand physical constraints (stable placements, reachable workspaces, collision-free paths)
- YAML-based
.hscenefiles viaSceneSerializer - Entities have UUID handles, hierarchical parent/child relationships
- 47+ component types including MuJoCo-specific colliders and robot controllers
| Service | Key RPCs |
|---|---|
SceneService |
GetSceneInfo, ListEntities, GetEntity, SetEntityTransform, SetSimulationMode |
AgentService |
Step, GetObservation, ResetAgent, GetAgentSchema |
MujocoService |
GetJointState, SendControl, GetMujocoInfo |
CameraService |
ListCameras, StreamCamera |
ViewportService |
StreamViewport |
TelemetryService |
StreamTelemetry |
DebugService |
Draw (lines, arrows, velocity commands) |
Session(high-level lifecycle) andLuckyEngineClient(low-level gRPC)SimulationContractfor domain randomization (friction, mass, motor, terrain, etc.)- Already supports
list_entities,set_entity_transform,get_scene_info
AssetRegistry(.hzrYAML) maps UUIDs to file paths and types- 24+ asset types: MeshSource, StaticMesh, Material, Texture, MujocoScene, Prefab, Script, etc.
ContentVault/stores robots (Panda, etc.), example scenes, models, materials
Prefab::Create(Entity)→ save entity subtree as reusable assetPrefabManager::InstantiatePrefab(handle, transform, parent)→ spawn into scene- Supports propagation and reversion
The existing gRPC API is read-heavy and control-focused — designed for RL training loops, not scene authoring. There is no RPC for:
- Creating/deleting entities
- Adding/removing components
- Spawning prefabs or MuJoCo models
- Modifying physics properties
- Creating/saving scenes
- Querying the asset registry
These gaps define the work needed.
Architecture:
User ↔ Claude (or any MCP client)
↓ MCP protocol (stdio or SSE)
MCP Server (Python)
↓ gRPC + direct file I/O
LuckyEngine (running instance)
What is MCP? Model Context Protocol — an open standard that lets AI models call tools on external systems. Claude, GPT, and other models support it natively. The MCP server exposes "tools" (functions the model can call) and "resources" (data the model can read).
Implementation: A Python process that:
- Connects to a running LuckyEngine instance via gRPC (using existing
LuckyEngineClient) - Extends the gRPC API where needed (new RPCs for entity creation, component manipulation)
- Exposes scene-authoring tools via MCP protocol
- Optionally reads/writes
.hsceneYAML files directly for offline scene generation
Proposed MCP Tool Surface (28 tools):
| Tool | Description |
|---|---|
create_scene |
Create a new empty scene with name and physics settings |
load_scene |
Load an existing .hscene file |
save_scene |
Save current scene to disk |
get_scene_info |
Get scene metadata, entity count, physics settings |
set_scene_settings |
Configure time mode, physics substeps, MuJoCo options |
| Tool | Description |
|---|---|
create_entity |
Create entity with name, transform, optional parent |
delete_entity |
Remove entity and its children |
duplicate_entity |
Clone entity subtree with new UUIDs |
list_entities |
List all entities with optional component filters |
get_entity |
Get full entity info (transform, components, children) |
set_transform |
Set entity position, rotation, scale |
set_parent |
Reparent entity in hierarchy |
| Tool | Description |
|---|---|
add_component |
Add component to entity (type + properties JSON) |
remove_component |
Remove component from entity |
update_component |
Modify component properties |
list_component_types |
List available component types and their schemas |
| Tool | Description |
|---|---|
search_assets |
Search asset registry by name, type, or path pattern |
get_asset_info |
Get asset metadata (type, path, dependencies) |
instantiate_prefab |
Spawn a prefab at a given transform |
load_mujoco_model |
Load a MJCF/URDF robot into the scene |
| Tool | Description |
|---|---|
set_simulation_mode |
Switch between Realtime / Deterministic / Fast |
configure_domain_randomization |
Set SimulationContract parameters |
add_collider |
Add box/sphere/capsule/mesh collider to entity |
set_physics_properties |
Set mass, friction, restitution on a body |
| Tool | Description |
|---|---|
raycast |
Cast ray and return hit info (for placement validation) |
overlap_query |
Check for overlapping entities in a region |
get_bounds |
Get AABB of an entity or subtree |
| Tool | Description |
|---|---|
capture_viewport |
Take a screenshot of the current viewport (for visual verification) |
Example Interaction:
User: "Create a kitchen with a Panda robot arm on the counter,
with 5 random mugs it needs to pick up"
Claude thinks:
1. create_entity("Kitchen", transform={0,0,0})
2. instantiate_prefab("kitchen_counter", parent="Kitchen", transform={0,0,0})
3. load_mujoco_model("Panda", parent="Kitchen", transform={0,0.9,0}) # counter height
4. search_assets(type="Prefab", query="mug") # find mug prefab
5. For i in 1..5:
- instantiate_prefab("mug", transform=random_on_counter())
- set_physics_properties(entity, mass=0.3, friction=0.6)
6. capture_viewport() # verify visually
7. save_scene("kitchen_training.hscene")
Implementation Effort: ~2-3 weeks
- Week 1: Extend gRPC API with entity/component creation RPCs
- Week 2: Build MCP server wrapping gRPC client + tool definitions
- Week 3: Testing, iteration, prompt engineering for scene quality
Pros:
- Leverages Claude's spatial reasoning and common-sense knowledge
- No training data needed
- Works immediately with any MCP-compatible model
- Users can refine scenes conversationally
- Natural language → scene in seconds
- Easy to add new tools as needs emerge
Cons:
- Requires API calls to a frontier model (cost per scene)
- Latency: multi-turn generation takes 10-30s
- Quality depends on model's understanding of physical plausibility
- Needs a running engine instance (or offline YAML generation mode)
Architecture:
User prompt → Fine-tuned 7B-13B model → Scene YAML → LuckyEngine loads it
Approach:
- Define a scene DSL — a simplified YAML schema that maps 1:1 to
.hsceneformat but omits UUIDs and internal details - Generate training data: 5,000-50,000 pairs of (description, scene DSL)
- Bootstrap from existing scenes + GPT-4 augmentation
- Procedural generation of scene variations with back-translated descriptions
- Fine-tune Llama 3.1 8B or Mistral 7B on (prompt → scene DSL)
- Post-process: validate schema, assign UUIDs, resolve asset references, compile to
.hscene - Deploy on your servers via vLLM or TGI
Scene DSL Example:
scene: kitchen_training
room:
type: kitchen
dimensions: [4.0, 3.0, 2.8] # meters
objects:
- type: counter
position: [0, 0, 0]
dimensions: [2.0, 0.9, 0.6]
material: granite
- type: robot
model: panda
mount: counter
position: [1.0, 0.9, 0.3]
- type: mug
count: 5
placement: random_on_surface(counter)
physics:
mass: [0.2, 0.4] # uniform range
friction: 0.6
training:
domain_randomization:
friction: [0.3, 1.0]
mass_scale: [0.8, 1.2]
curriculum:
- stage: 1
mug_count: 1
- stage: 2
mug_count: 3
- stage: 3
mug_count: 5Data Generation Strategy:
- Seed scenes: Export all existing ContentVault scenes to DSL format
- Augmentation: Use Claude/GPT-4 to generate 100 variations per seed scene
- Procedural: Write generators for common environments (kitchens, warehouses, tables, shelves)
- Back-translation: For each generated scene, ask a model to write 5 different natural language descriptions
- Validation: Filter training data through a physics validator (stable placements, no interpenetration)
Model Choices:
| Model | Size | VRAM | Inference Speed | Quality |
|---|---|---|---|---|
| Phi-3 Mini | 3.8B | 8GB | ~50 tok/s (A100) | Good for templated scenes |
| Llama 3.1 8B | 8B | 16GB | ~30 tok/s (A100) | Good general quality |
| Mistral 7B | 7B | 14GB | ~35 tok/s (A100) | Strong structured output |
| Llama 3.1 70B | 70B | 140GB | ~8 tok/s (A100) | Near-frontier quality |
| CodeLlama 13B | 13B | 26GB | ~20 tok/s (A100) | Strong YAML/code generation |
Recommended: Start with Llama 3.1 8B — good balance of quality, speed, and resource requirements. Fine-tune with QLoRA for efficiency.
Implementation Effort: ~6-10 weeks
- Weeks 1-2: Define scene DSL, build DSL → .hscene compiler
- Weeks 3-5: Generate and validate training data (5,000+ examples)
- Weeks 6-7: Fine-tune model, evaluate, iterate
- Weeks 8-9: Build serving infrastructure (vLLM, API, validation pipeline)
- Week 10: Integration testing with engine
Pros:
- Fast inference (~1-3s per scene)
- Runs entirely on your infrastructure (no API costs)
- Works offline / air-gapped
- Embeddable in your product as a feature
- Deterministic with temperature=0
Cons:
- Large upfront investment (data + training)
- Limited to patterns seen in training data
- Poor at novel/complex requests
- Requires ongoing data curation as engine evolves
- Output needs validation and post-processing
Architecture:
User prompt
↓
Router (complexity classifier)
├── Simple/templated → Fine-tuned small model → Scene DSL → Engine
└── Complex/novel → Claude via MCP → Engine (tool-by-tool)
Router Logic (could be a small classifier or heuristic):
- Simple path (small model): "a room with a table", "empty warehouse", "standard pick-and-place setup"
- Complex path (MCP/Claude): "recreate the IKEA Lack table assembly task with proper joint constraints", "a kitchen where the fridge door swings open and mugs are inside", multi-step reasoning needed
Implementation Effort: ~8-12 weeks (MCP first, then model, then router)
Pros:
- Fast for common cases, smart for hard cases
- Graceful degradation (if model fails, fall back to MCP)
- Cost-efficient (small model handles 80% of requests)
Cons:
- Most infrastructure to maintain
- Router needs tuning to avoid misclassification
- Two code paths to keep in sync with engine changes
Architecture:
User ↔ Agent SDK orchestrator
↓ (tool calls)
LuckyEngine gRPC + Asset Registry + Scene Files
What it is: Rather than MCP (which requires an external MCP client like Claude Desktop), build a standalone Python agent using the Claude Agent SDK. The agent has:
- A system prompt encoding LuckyEngine's physics constraints, component schemas, and asset catalog
- Tools that directly call gRPC or manipulate scene files
- Multi-turn reasoning with memory across the scene-building process
Differences from MCP:
- Self-contained: runs as a Python script or service, no MCP client needed
- Can embed custom validation logic between tool calls
- Can maintain state across turns (asset cache, placement history, collision map)
- Can be triggered from your editor UI, CLI, or web interface
Implementation Effort: ~3-4 weeks (similar to MCP, but the orchestration layer is custom instead of relying on MCP protocol)
Pros:
- Full control over agent behavior
- Can embed in your editor as a panel
- Custom validation between steps
- No dependency on MCP ecosystem
Cons:
- Locked to Claude API (vs MCP which is model-agnostic)
- More custom code to maintain than MCP
- Same API cost considerations as MCP
| Criterion | MCP Server | Fine-Tuned Model | Hybrid | Agent SDK |
|---|---|---|---|---|
| Time to first demo | 1-2 weeks | 6-8 weeks | 8-10 weeks | 2-3 weeks |
| Time to production | 3-4 weeks | 10-12 weeks | 12-14 weeks | 4-5 weeks |
| Scene quality | High (frontier reasoning) | Medium (training-limited) | High | High |
| Novel scene handling | Excellent | Poor | Good | Excellent |
| Inference speed | 10-30s | 1-3s | 1-30s | 10-30s |
| Per-scene cost | ~$0.05-0.50 | ~$0.001 | ~$0.01-0.10 | ~$0.05-0.50 |
| Offline/air-gapped | No | Yes | Partial | No |
| Maintenance burden | Low | High (data + model) | High | Medium |
| Embeddable in product | Via MCP clients | Yes | Yes | Yes (custom UI) |
| Model-agnostic | Yes (MCP standard) | Yes (your model) | Partial | No (Claude only) |
Start here. The ROI is highest: minimal investment, maximum capability, validates the concept.
Concrete steps:
-
Extend gRPC API (1 week)
- Add
CreateEntity,DeleteEntity,DuplicateEntityRPCs toSceneService - Add
AddComponent,RemoveComponent,UpdateComponentRPCs - Add
SearchAssets,InstantiatePrefabRPCs - Add
CaptureViewportRPC (return PNG bytes) - Update
.protofiles inHazel/vendor/luckyrobots/src/luckyrobots/grpc/proto/ - Implement server-side handlers in C++ gRPC server
- Add
-
Build MCP Server (1 week)
- Python package using
mcpSDK - Wraps
LuckyEngineClientwith tool definitions - Adds spatial reasoning helpers (random placement on surface, grid layouts, etc.)
- Schema validation for component property JSON
- Python package using
-
Prompt Engineering & Testing (1 week)
- System prompt with engine constraints, component schemas, asset catalog
- Test with 20+ scene generation scenarios
- Iterate on tool granularity (too fine = slow, too coarse = inflexible)
-
Editor Integration (1 week, optional)
- Chat panel in ImGui editor that sends messages to MCP server
- Or: CLI tool (
lucky-scene create "a kitchen with a Panda arm") - Or: integrate with Claude Desktop via MCP config
After Phase 1, you'll know:
- What types of scenes users actually request
- Where the frontier model excels vs. struggles
- Whether latency/cost is acceptable for your use case
- What the actual tool surface needs to be
If latency or cost is a problem → proceed to fine-tuned model (Option B) If it's working well → invest in better tools, spatial reasoning, and asset coverage
If usage data shows 70%+ of requests are templated/simple, fine-tune a small model for the fast path and keep MCP for complex cases.
lucky-scene-mcp/
├── pyproject.toml
├── src/
│ └── lucky_scene_mcp/
│ ├── __init__.py
│ ├── server.py # MCP server entry point
│ ├── tools/
│ │ ├── scene.py # Scene management tools
│ │ ├── entity.py # Entity CRUD tools
│ │ ├── component.py # Component manipulation tools
│ │ ├── asset.py # Asset search and instantiation
│ │ ├── physics.py # Physics configuration tools
│ │ └── spatial.py # Spatial reasoning helpers
│ ├── resources/
│ │ ├── asset_catalog.py # Browsable asset registry
│ │ ├── component_schemas.py # Component type schemas
│ │ └── scene_templates.py # Pre-built scene templates
│ ├── engine_client.py # Wrapper around LuckyEngineClient
│ └── validation.py # Physics plausibility checks
Tool granularity: Medium. Don't expose raw component fields as individual tools (too slow — 50 tool calls per entity). Don't make one monolithic "create_scene_from_json" tool (defeats the purpose of iterative reasoning). Sweet spot: entity-level operations with component bundles.
# Good: entity-level with component bundle
create_entity(
name="Kitchen Counter",
transform={"position": [0, 0, 0], "scale": [2, 0.9, 0.6]},
components={
"StaticMesh": {"asset": "mesh://counter_01"},
"BoxCollider": {"size": [2, 0.9, 0.6]},
"RigidBody": {"type": "Static"}
}
)
# Too fine: separate calls per component
create_entity(name="Kitchen Counter")
set_transform(entity, position=[0,0,0])
add_mesh(entity, asset="counter_01")
add_collider(entity, type="box", size=[2,0.9,0.6])
add_rigidbody(entity, type="Static")Asset resolution: The model shouldn't need to know UUIDs. Provide fuzzy search:
search_assets(query="mug", type="Prefab")
# Returns: [{"name": "CeramicMug_01", "handle": "...", "path": "ContentVault/Props/Mugs/..."}]Spatial helpers: The model is bad at precise 3D math. Provide helpers:
place_on_surface(entity, surface_entity, offset=[0, 0, 0]) # snap to top of surface
random_placement(entity, bounds={"min": [-1,0,-1], "max": [1,0,1]}, surface=counter)
grid_layout(prefab, rows=3, cols=3, spacing=0.3, origin=[0, 0.9, 0])Validation: After each placement, optionally run:
check_placement(entity)
# Returns: {"stable": true, "collisions": [], "reachable_by": ["Panda"]}Resources let the model browse information without tool calls:
| Resource URI | Content |
|---|---|
scene://current/info |
Current scene metadata |
scene://current/entities |
Entity tree |
assets://registry |
Full asset catalog |
assets://prefabs |
Available prefabs |
assets://robots |
Available robot models |
schema://components |
Component type definitions and property schemas |
templates://scenes |
Pre-built scene templates |
The MCP server should include a system prompt (via MCP instructions) that encodes:
- Physics constraints: "Objects must be placed on surfaces or have RigidBody components. A mug on a counter needs Y position = counter_height + mug_half_height."
- Component compatibility: "MuJoCo bodies cannot coexist with Jolt RigidBody on the same entity. Use MujocoBoxCollider for MuJoCo scenes."
- Asset conventions: "Robot models are in ContentVault/Robots/. Props are in ContentVault/Props/. Use search_assets to discover available assets."
- Scene structure: "Always create a root entity for the environment. Parent all scene objects to it. Place robots at workspace-appropriate heights."
- Coordinate system: "Y-up, meters. Typical room height: 2.4-3.0m. Counter height: 0.9m. Table height: 0.75m."
These RPCs need to be added to the engine's gRPC server to support the MCP tool surface:
// Entity creation & manipulation
rpc CreateEntity(CreateEntityRequest) returns (CreateEntityResponse);
rpc DeleteEntity(DeleteEntityRequest) returns (DeleteEntityResponse);
rpc DuplicateEntity(DuplicateEntityRequest) returns (DuplicateEntityResponse);
rpc SetEntityParent(SetEntityParentRequest) returns (SetEntityParentResponse);
// Component manipulation
rpc AddComponent(AddComponentRequest) returns (AddComponentResponse);
rpc RemoveComponent(RemoveComponentRequest) returns (RemoveComponentResponse);
rpc UpdateComponent(UpdateComponentRequest) returns (UpdateComponentResponse);
rpc GetComponents(GetComponentsRequest) returns (GetComponentsResponse);
// Scene lifecycle
rpc CreateScene(CreateSceneRequest) returns (CreateSceneResponse);
rpc SaveScene(SaveSceneRequest) returns (SaveSceneResponse);
rpc SetSceneSettings(SetSceneSettingsRequest) returns (SetSceneSettingsResponse);
message CreateEntityRequest {
string name = 1;
Transform transform = 2;
optional EntityId parent = 3;
map<string, google.protobuf.Struct> components = 4; // component_type → properties
}
message AddComponentRequest {
EntityId entity = 1;
string component_type = 2;
google.protobuf.Struct properties = 3;
}service AssetService {
rpc SearchAssets(SearchAssetsRequest) returns (SearchAssetsResponse);
rpc GetAssetInfo(GetAssetInfoRequest) returns (GetAssetInfoResponse);
rpc InstantiatePrefab(InstantiatePrefabRequest) returns (InstantiatePrefabResponse);
rpc LoadMujocoModel(LoadMujocoModelRequest) returns (LoadMujocoModelResponse);
rpc ListAssetTypes(google.protobuf.Empty) returns (ListAssetTypesResponse);
}
message SearchAssetsRequest {
optional string query = 1; // fuzzy name search
optional string type = 2; // filter by asset type
optional string path_glob = 3; // filter by path pattern
uint32 max_results = 4;
}
message InstantiatePrefabRequest {
uint64 asset_handle = 1;
Transform transform = 2;
optional EntityId parent = 3;
}service SpatialService {
rpc Raycast(RaycastRequest) returns (RaycastResponse);
rpc OverlapBox(OverlapBoxRequest) returns (OverlapResponse);
rpc OverlapSphere(OverlapSphereRequest) returns (OverlapResponse);
rpc GetEntityBounds(GetEntityBoundsRequest) returns (BoundsResponse);
rpc CaptureViewport(CaptureViewportRequest) returns (CaptureViewportResponse);
}| Tool | What it does | Relevance |
|---|---|---|
| NVIDIA Isaac Sim + Replicator | Procedural scene generation for robot training via Python API | Closest competitor; shows the market wants this. Tightly coupled to Omniverse. |
| AI2-THOR / ProcTHOR | Procedural house generation for embodied AI | Open-source procedural layouts. Could borrow algorithms. |
| Habitat | Meta's embodied AI simulator with scene datasets | Dataset-driven rather than generative. Different approach. |
| SceneDiffusion / LayoutGPT | Research models for 3D scene layout from text | Academic; not production-ready. Shows LLMs can do spatial layout. |
| 3D-GPT | LLM-based procedural 3D generation | Targets Blender; concept is transferable. |
| Tool | What it does | Integration path |
|---|---|---|
| Meshy / Tripo3D | Text/image → 3D mesh (GLB) | Generate custom props, import as MeshSource |
| Rodin (Hyper3D) | High-quality text → 3D model | Higher quality meshes for training scenes |
| OpenUSD/MaterialX | Standard material descriptions | Could standardize material properties |
These are complementary — they solve "I need a mug model" while your agent solves "I need a kitchen scene with mugs placed realistically."
| Scenario | Input tokens | Output tokens | Cost/scene |
|---|---|---|---|
| Simple scene (5 entities) | ~3,000 | ~2,000 | ~$0.05 |
| Medium scene (20 entities) | ~10,000 | ~8,000 | ~$0.20 |
| Complex scene (50+ entities, multi-turn) | ~30,000 | ~20,000 | ~$0.60 |
| Batch generation (100 training variants) | ~500,000 | ~300,000 | ~$10.00 |
Based on Claude Sonnet pricing. Using Haiku for simple scenes would be ~10x cheaper.
| Setup | Hardware | Monthly cost | Per-scene cost |
|---|---|---|---|
| Llama 8B on 1x A100 | Cloud GPU | ~$2,000/mo | ~$0.001 |
| Llama 8B on 1x L40S | Cloud GPU | ~$1,200/mo | ~$0.001 |
| Llama 8B on local RTX 4090 | One-time $2K | ~$50/mo (power) | ~$0.0005 |
Break-even: if generating >5,000 scenes/month, self-hosted becomes cheaper.
Start with an MCP server. It's the fastest path to a working demo, requires no training data, and validates the concept with minimal risk. The gRPC API extensions you build for MCP will also benefit the Python SDK and any future approach.
- Design the entity/component creation RPCs — extend
scene.protoandasset.proto - Implement C++ gRPC handlers for entity creation, component manipulation, prefab instantiation
- Build the MCP server in Python, wrapping
LuckyEngineClient - Test with Claude Desktop or Claude Code as the MCP client
- Iterate on tool design based on real scene generation attempts
- Should scene generation work offline (YAML file generation) or require a running engine?
- What asset library will be available? (The ContentVault needs props, furniture, obstacles)
- Should the agent also configure training parameters (rewards, curriculum) or just scene geometry?
- What's the target user? (Robotics researchers vs. game designers vs. automated pipelines)