Skip to content

Instantly share code, notes, and snippets.

@sleexyz
Created April 22, 2026 02:09
Show Gist options
  • Select an option

  • Save sleexyz/d28b42fdd1b7d94d08ba559ed46ff60a to your computer and use it in GitHub Desktop.

Select an option

Save sleexyz/d28b42fdd1b7d94d08ba559ed46ff60a to your computer and use it in GitHub Desktop.
Qwen3-8B (thinking OFF) solves Opus Magnum gen046 seed01 — 7 turns, self-hosted vLLM H100, compare to https://gist.github.com/sleexyz/c22f87ff589b4d1932635ed28ee93769

OM-RLM trajectory — full, nothing elided

  • Task: eval_fixed_instructions_only_board_cheatsheet_v2_slim__gen046__seed01
  • Model: qwen/qwen3-8b
  • Endpoint: https://websim-ai--om-rlm-qwen3-8b-vllm-vllm.us-east.modal.direct/v1/chat/completions
  • Trial dir: jobs/sum-scores/2026-04-21__21-59-14/eval_fixed_instructions_only_boa__i5GDvNJ
  • Reward (sum-scores): 0.403
  • Turns: 7
  • Submitted / solved flags: True / True

======================================================================

SYSTEM PROMPT (2117 chars)

====================================================================== You are an engineering agent solving an Opus Magnum puzzle inside a sandboxed Python REPL. Your only action channel is a single fenced ```python block per turn. The code executes in a persistent namespace — variables assigned in turn N are visible in turn N+1. Use print(...) every turn; stdout is the ONLY feedback channel back to you.

Workflow (mirrors the standard Opus-Magnum-bench loop)

  1. Read the inlined puzzle state and the inlined starter solution.py source. Understand what's placed where and which arms exist.
  2. Mentally design arm programs that take reagent atoms through glyphs to the output station.
  3. Build arm_programs as a dict {arm_number: [(command, cycle), ...]} incrementally. The persistent namespace lets you keep per-arm lists between turns (e.g. arm_1_prog = [...], tweak, submit({1: arm_1_prog, 2: arm_2_prog})).
  4. Call safe_verify(arm_programs) to test. Read the returned dict's error field.
  5. On failure, call explain_failure(arm_programs) or motion_preview(arm_programs, cycles=30) for a focused diagnostic. Fix the specific program that broke.
  6. When safe_verify returns solved=True, call submit(arm_programs) to commit and end the episode.

Critical constraints

  • You CANNOT add, remove, or replace parts. No add_armN / add_bonder etc. are callable from the REPL. The parts in the inlined starter are all you have.
  • arm_programs keys MUST be a subset of arm_numbers. Foreign keys (3, 6, etc. when arm_numbers == [1, 2]) are rejected.
  • Cycle indices within one arm's list MUST be unique and strictly increasing. See the PROGRAM ARMS section in the task body for details.
  • Each arm has a kind shown in the starter layout (arm1/arm2/arm3/arm6/piston). extend/retract are only valid on arm6 / piston. track_plus/track_minus only on track-mounted arms.
  • Failed submits return the verifier dict and DO NOT end the episode — you can keep iterating.

Reward

Binary on solved=True. Do NOT optimize cycles/cost/area — any solved solution is worth the same as any other solved solution. Keep iterating until solved=True.

INITIAL USER MESSAGE (24266 chars)

======================================================================

REPL state (already loaded — inspect via Python, not bash)

  • puzzle — opus_magnum_bench.Puzzle parsed from the puzzle file.
  • board — starter Board with every part pre-placed (Board.from_builder).
  • arm_numbers: list[int] — the arm_number ints you use as keys in submit().
  • starter_solution_py: str — full source of the starter solution.py.
  • safe_verify(arm_programs){solved, valid, error, ...}. Test before committing; does NOT end the episode.
  • submit(arm_programs) — commits + writes /workspace/solution.solution. ONLY ends the episode on solved=True; unsolved submits let you iterate.
  • explain_failure(arm_programs), motion_preview(arm_programs, cycles=30), layout_check(arm_programs) — debug helpers.
  • trace_solution(puzzle, builder, cycle_limit=30) — raw per-cycle frames.
  • read(path) → read a file from /workspace or opus_magnum_bench source.
  • describe_arm_program(prog), hex_add, hex_sub, hex_distance, hex_neighbors, arm_positions_for_target, check_placement_overlaps.
  • omb — the full opus_magnum_bench module (escape hatch).

Persistent namespace tip: variables you assign persist across turns. Build incrementally — e.g. keep arm_1_prog = [...] and arm_2_prog = [...] between turns, tweak one, re-call submit({1: arm_1_prog, 2: arm_2_prog}).

PROGRAM ARMS (read carefully)

Every integer in an arm's program is a cycle index — the exact simulation cycle when that instruction fires. Within ONE arm's program, cycle indices must be UNIQUE and STRICTLY INCREASING. They are NOT iteration counts, NOT repeat counts, and NOT offsets. Different arms may share the same cycle index (they fire in parallel).

In dict-submission form:

# Correct — arm 1 fires at cycles 0, 1, 2, then repeats starting cycle 3:
arm_programs = {1: [("grab", 0), ("rotate_cw", 1), ("drop", 2), ("repeat", 3)]}

# Wrong — "two instructions that have the same index":
arm_programs = {1: [("grab", 1), ("rotate_cw", 1), ("drop", 1)]}

# Wrong — three rotations at cycle 1 collapse to one. Use 1, 2, 3:
arm_programs = {1: [("rotate_cw", 1), ("rotate_cw", 1), ("rotate_cw", 1)]}

Commands: grab, drop, rotate_cw, rotate_ccw, pivot_cw, pivot_ccw, extend, retract, track_plus, track_minus, repeat, reset, noop.

Two parallel arms example (dict keys are arm_number ints):

arm_programs = {
    1: [("grab", 0), ("rotate_cw", 1), ("drop", 2)],
    2: [("grab", 0), ("rotate_ccw", 1), ("drop", 2)],
}

DEBUGGING

  • safe_verify(arm_programs) — returns {solved, valid, error, ...} dict. Use before committing. valid=False means the program itself is malformed (e.g. same-cycle-index); valid=True, solved=False means it ran but didn't produce the required outputs.
  • explain_failure(arm_programs) — focused post-mortem string. Dispatches on error type (collision / overlap / cycle-limit / missing-product) and prints exact cycle + cell + arm + payload info.
  • motion_preview(arm_programs, cycles=30) — per-cycle snapshot c=NN | arm0 base=... rot=... tip=... holds=[...] | outputs=N/M. Stops one cycle past any collision.
  • layout_check(arm_programs) — static parts + arm-reach check (no simulation).
  • trace_solution(puzzle, builder, cycle_limit=30) — raw per-cycle trace frames for deep debugging. Normally motion_preview is enough.

If safe_verify returns valid:True, solved:False, error:"did not complete within cycle limit", the layout is fine but no product was emitted. Usually you need ("repeat", N) at the end of each arm's program, not a new layout.

Common gotchas

  • Same cycle index used twice within ONE arm's program → verifier returns valid:False, error:"two instructions that have the same index". Use unique strictly-increasing ints.
  • arm_programs key not in arm_numbersValueError: unknown arm_number. Only program the arms listed in arm_numbers.
  • extend/retract on an arm1 / arm2 / arm3"trying to extend/retract a non-piston arm". Check each arm's kind in the starter layout.
  • track_plus/track_minus on an arm not mounted to a track → "trying to move an arm along a track that isn't on a track". If no Track appears in the starter's Glyphs/IO section, don't use track ops.
  • No product emitted → add ("repeat", N) to the program so the simulator loops your choreography until outputs_required is met.
  • Don't forget to print(...) — expressions don't auto-echo in this REPL.

Puzzle state

Puzzle view for GEN046

Produce 6 outputs. Reagents: Reagent 0: 2 atoms (air x1, fire x1) and 1 bond (normal x1).. Products: Product 0: 2 atoms (salt x2) and 1 bond (normal x1).. Available mechanisms: Arm 1, Arm 2, Arm 3, Arm 6, Piston, Track, Bonder, Unbonder, Multi-Bonder, Glyph of Calcification, Disposal, Glyph of Equilibrium.

Puzzle constants

outputs_required: 6
output_scale:     1
production_mode:  false

Reagents

Reagent 0 — Reagent 0: 2 atoms (air x1, fire x1) and 1 bond (normal x1).

  atoms: fire@(0, 0), air@(1, 0)
  bonds: normal (0, 0)-(1, 0)

Products

Product 0 — Product 0: 2 atoms (salt x2) and 1 bond (normal x1).

  atoms: salt@(0, 0), salt@(1, 0)
  bonds: normal (0, 0)-(1, 0)

Starter layout

Starter layout — structured board state

Board — static layout
outputs: 0/6

Arms:
  - arm0, kind=arm1, base=(0, -1), rot=1, len=1, tip=(0, 0)

Glyphs:
  - calcification at (1, -1) rot=0 (idle)
  - calcification at (2, -2) rot=0 (idle)

IO:
  - input#0 at (0, 0) rot=0
  - output_standard#0 at (1, -2) rot=4

Glyph activation cells (atoms at these cells trigger the glyph):

- calcification at (1, -1) rot=0: input_cardinal=(1, -1)
- calcification at (2, -2) rot=0: input_cardinal=(2, -2)

Cells claimed by starter layout (do NOT place new parts here; check_placement_overlaps will flag conflicts):

  (0, -1), (0, 0), (1, -3), (1, -2), (1, -1), (1, 0), (2, -2)

Rotation legend

Axial hex coordinates (u, v). Neighbours of (0, 0) in directions 0..5: rot 0 (E): (+1, 0) rot 3 (W): (-1, 0) rot 1 (NE): ( 0, +1) rot 4 (SW): ( 0, -1) rot 2 (NW): (-1, +1) rot 5 (SE): (+1, -1) CW rotation decrements direction by 1 (mod 6); CCW increments by 1.

Glyph reference (this puzzle only)

Local footprint + activation hexes at position=(0, 0), rotation=0. For other placements, rotate each offset with hex_transform(offset, rotation) and then translate by position.

  • bonder (rot=0): footprint: (1, 0), (0, 0) activation: atom_a=(0, 0), atom_b=(1, 0)
  • unbonder (rot=0): footprint: (1, 0), (0, 0) activation: atom_a=(0, 0), atom_b=(1, 0)
  • multibonder (rot=0): footprint: (1, 0), (0, -1), (-1, 1), (0, 0) activation: center=(0, 0), spoke_a=(1, 0), spoke_b=(0, -1), spoke_c=(-1, 1)
  • calcification (rot=0): footprint: (0, 0) activation: input_cardinal=(0, 0)
  • disposal (rot=0): footprint: (1, 0), (0, 1), (-1, 1), (-1, 0), (0, -1), (1, -1), (0, 0) activation: target=(0, 0)
  • equilibrium (rot=0): footprint: (0, 0) activation: tile=(0, 0)

Starter solution.py (source — the parts already placed for you)

from __future__ import annotations

import argparse
from pathlib import Path

from opus_magnum_bench import empty_solution
from opus_magnum_bench.sdk import om


def build_solution(puzzle_path: str | Path):
    builder = empty_solution(puzzle_path, name='eval_fixed__gen046')

    part_0 = builder.add_part(name=b'input', position=(0, 0), length=1, rotation=0, which_reagent_or_product=0, track_hexes=[], arm_number=0, conduit_id=0, conduit_hexes=[])

    part_1 = builder.add_part(name=b'glyph-calcification', position=(1, -1), length=1, rotation=0, which_reagent_or_product=0, track_hexes=[], arm_number=0, conduit_id=0, conduit_hexes=[])

    part_2 = builder.add_part(name=b'glyph-calcification', position=(2, -2), length=1, rotation=0, which_reagent_or_product=0, track_hexes=[], arm_number=0, conduit_id=0, conduit_hexes=[])

    part_3 = builder.add_part(name=b'out-std', position=(1, -2), length=1, rotation=4, which_reagent_or_product=0, track_hexes=[], arm_number=0, conduit_id=0, conduit_hexes=[])

    arm_0_4 = builder.add_arm1(position=(0, -1), rotation=1, arm_number=0, length=1)

    return builder


def main() -> int:
    parser = argparse.ArgumentParser(description="Rebuild an Opus Magnum solution file from readable Python.")
    parser.add_argument("puzzle", nargs="?", default="puzzle.puzzle")
    parser.add_argument("out", nargs="?", default="solution.solution")
    args = parser.parse_args()
    builder = build_solution(args.puzzle)
    builder.save(args.out)
    return 0


if __name__ == "__main__":
    raise SystemExit(main())

Glyph reference (from wiki/Glyphs.md)

Glyphs

Source: Opus Magnum Wiki

Glyphs are alchemical devices used to transform Elements.

List of Glyphs

Bonding Glyphs

Name Description Cost Area
Glyph of Bonding Create a simple bond between Elements 10 G 2
Glyph of Multi-bonding Create a simple bond between Elements 30 G 4
Glyph of Triplex-bonding Create a triple bond between Fire Elements 20 G 3
Glyph of Unbonding Destroy all bonds (simple and special) between Elements 10 G 2

Transformation Glyphs

Name Description Cost Area
Glyph of Calcification Transform a Cardinal Element into Salt 10 G 1
Glyph of Duplication Transform Salt into a Cardinal Element 20 G 2
Glyph of Projection Use Quicksilver to upgrade a Base Metal 20 G 2
Glyph of Purification Transform two Base Metals into a better one 20 G 3
Glyph of Animusmus Transform two Salts into a Vitae and a Mors Element 20 G 4
Glyph of Disposal Destroy one Element 0 G 7
Glyph of Unification Fuses the four Cardinal Elements (air, earth, fire, water) into a Quintessence Element 20 G 5
Glyph of Dispersion Transform a Quintessence Element into the four Cardinal Elements (air, earth, fire, water) 20 G 5

Special Glyphs

Name Description Cost Area
Glyph of Equilibrium No effect 0 G 1
Conduit Teleport an atom between chambers (Appendix puzzle only) - 1

Disposition

Only one Glyph can occupy a single tile. Glyphs cannot overlap with arm axles, reagents, or tracks.

Glyph Shapes and Mechanisms

The shape of a Glyph determines which set of Mechanisms can interact with it effectively. Each Glyph may include grabbing/dropping tiles (where arms must grip or release an Element) and passing tiles (where Elements only need to pass through or rest).

Shape Glyph Function Tile Mechanism Set Cost Area
Single tile and Pair tiles Calcification One Passing Tile One Fixed-length single arm 20 G 1
Single tile and Pair tiles Bonding / Unbonding / Duplication Two Passing Tiles One Fixed-length single arm 20 G 1
Single tile and Pair tiles Projection One Passing Tile, One Grabbing/Dropping Tile One Fixed-length single arm 20 G 1
Triple axis Multi-bonding Four Passing Tiles One Piston arm 40 G 1
Triangle Purification Three Grabbing/Dropping Tiles One Fixed-length single arm + three Tracks / Two Fixed-length single arms 35 G 2
Triangle Triplex-bonding Three Passing Tiles One Fixed-length single arm 20 G 1
Diamond Animusmus Four Grabbing/Dropping Tiles One Fixed-length single arm + Four Tracks / Two Fixed-length single arms 40 G 2
Hexagonal Disposal One Grabbing/Dropping Tile One Fixed-length single arm 20 G 1
Cross Unification Five Grabbing/Dropping Tiles One Fixed-length single arm + Three Tracks / Two Fixed-length single arms 35 G 2
Bilayer Dispersion Five Grabbing/Dropping Tiles One Fixed-length single arm + Four Tracks / One Piston arm + Two Tracks 40 G 2

SDK cheatsheet

Opus Magnum SDK Cheatsheet

Compact reference for opus_magnum_bench. Import everything from the top-level package:

from opus_magnum_bench import (
    empty_solution, SolutionBuilder, ArmBuilder,
    verify_solution, safe_verify, VerifyResult,
    trace_solution,
    Board, Atom, Arm, Glyph,
    hex_add, hex_sub, hex_dir, hex_neighbors, hex_distance,
    hex_transform, hex_transform_position,
    arm_positions_for_target, arm_grab_pos, arm_payload_positions,
    part_occupied_cells, check_placement_overlaps,
    PART_FOOTPRINTS,
    simulate_glyph, glyph_activation_hexes, glyph_footprint,
    explain_failure, layout_check, motion_preview,
    DEFAULT_CYCLE_LIMIT,
)

Hex coords are axial (u, v). Directions 0..5 (unit offsets from (0, 0)):

rot name offset
0 E (+1, 0)
1 NE ( 0, +1)
2 NW (-1, +1)
3 W (-1, 0)
4 SW ( 0, -1)
5 SE (+1, -1)

rotate_cw decrements rotation by 1 mod 6; rotate_ccw increments by 1.


Minimal end-to-end

b = empty_solution("puzzle.puzzle")
b.add_input(position=(0, 0), which=0, rotation=0)
b.add_output_standard(position=(3, 0), which=0, rotation=0)
arm = b.add_arm1(position=(1, 0), rotation=0, arm_number=0, length=1)
arm.grab(0).rotate_cw(1).drop(2).reset(3)
b.save("solution.solution")

result = verify_solution("puzzle.puzzle", "solution.solution")
print(result.solved, result.cycles, result.cost, result.area)

SolutionBuilder (empty_solution(puzzle, *, name=...))

All add_* methods take keyword-only args.

Arms (return ArmBuilder):

  • add_arm1(*, position, rotation, arm_number, length=1)
  • add_arm2(*, position, rotation, arm_number, length=1)
  • add_arm3(*, position, rotation, arm_number, length=1)
  • add_arm6(*, position, rotation, arm_number, length=1)
  • add_piston(*, position, rotation, arm_number, length=1)

I/O and track:

  • add_input(*, position, which=0, rotation=0)
  • add_output_standard(*, position, which=0, rotation=0)
  • add_output_repeating(*, position, which=0, rotation=0)
  • add_track(*, position, hexes)hexes is a list of (u, v) including position.

Bonders / glyphs:

  • add_bonder, add_unbonder, add_multibonder
  • add_calcification, add_projection
  • All take *, position, rotation=0.

Escape hatch:

  • add_part(*, name, position, length=1, rotation=0, which_reagent_or_product=0, track_hexes=None, arm_number=0, conduit_id=0, conduit_hexes=None)

Serialize:

  • builder.to_bytes() -> bytes
  • builder.save(path) -> Path

Part-name aliases (bytes vs friendly strings)

add_part(name=...) takes the raw bytestring from the solution file format. Everywhere else in the SDK (PART_FOOTPRINTS, part_occupied_cells, check_placement_overlaps, Board.glyphs_of(kind=...), _format_arm) uses the friendly string key. They are not interchangeable — mixing them produces silent lookup misses.

friendly (PART_FOOTPRINTS / PART_SPEC_KEYS) raw (add_part(name=...))
arm1, arm2, arm3, arm6 b'arm1', b'arm2', b'arm3', b'arm6'
piston b'piston'
track b'track'
bonder b'bonder'
unbonder b'unbonder'
multibonder b'bonder-speed'
calcification b'glyph-calcification'
projection b'glyph-projection'
input b'input'
output_standard b'out-std'
output_repeating b'out-rep'

Prefer the typed builders (add_bonder, add_output_standard, …) — they wrap the bytes for you. Only reach for add_part(name=...) when you need the escape hatch.

Re-check which builders are allowed vs forbidden for the current puzzle in puzzle_view.md.


ArmBuilder — program instructions

Each method returns the builder (chainable). cycle is the 0-indexed step when the instruction fires.

  • grab(cycle), drop(cycle)
  • rotate_cw(cycle), rotate_ccw(cycle)
  • extend(cycle), retract(cycle) (piston only)
  • pivot_cw(cycle), pivot_ccw(cycle)
  • track_plus(cycle), track_minus(cycle) (requires a track under the arm base)
  • repeat(cycle), reset(cycle), noop(cycle)
  • program([(opcode, cycle), ...]) — bulk form; opcodes are strings.

Opcodes (strings): rotate_cw, rotate_ccw, extend, retract, grab, drop, pivot_cw, pivot_ccw, track_plus, track_minus, repeat, reset, noop.


Arm instruction reference

Each instruction consumes exactly one tape cycle. Held atoms always move with the arm during rotation/track/extend/retract.

opcode effect notes
grab Start holding atoms on the arm's current grabbers. First half of the cycle.
drop Release everything currently held. First half of the cycle.
rotate_cw Rotate arm clockwise around its base (decrement rotation mod 6). Payload rotates with the arm.
rotate_ccw Rotate arm counter-clockwise around its base (increment rotation mod 6). Payload rotates with the arm.
pivot_cw Rotate held molecule clockwise around the grab point; arm base does not move. Requires the puzzle to allow pivots.
pivot_ccw Rotate held molecule counter-clockwise around the grab point. Requires the puzzle to allow pivots.
extend Piston reach +1 (max 3). Piston only.
retract Piston reach −1 (min 1). Piston only.
track_plus Slide arm one step forward along its track. Arm base must be on a track cell.
track_minus Slide arm one step backward along its track. Arm base must be on a track cell.
repeat Re-emit the tape segment since the previous repeat marker (or tape start). Compact periodic programs.
reset Auto-generate reverse moves to restore the starting pose. Simulator expands this into drop + track/rotation/piston reverse steps.
noop Idle for one cycle. Padding / alignment only.

Verifier

result = verify_solution(puzzle, solution, *, cycle_limit=DEFAULT_CYCLE_LIMIT)
result = safe_verify(puzzle, solution, *, cycle_limit=DEFAULT_CYCLE_LIMIT)

puzzle / solution accept a path, bytes, or object (SolutionBuilder works directly for solution). safe_verify swallows all exceptions except FileNotFoundError and returns a VerifyResult with error set — use it during iterative debugging.

VerifyResult fields:

  • valid: bool — layout legal (no overlap, simulator accepted it).
  • solved: bool — delivered outputs_required products within cycle_limit.
  • reward: float — scoring-profile dependent (passed via scoring_profile= kwarg or --scoring-profile CLI flag; default sum-scores).
  • overlap: int | None — offending overlap count if valid=False.
  • cost, cycles, area: int | None — set only when solved=True.
  • error: str | None, error_cycle: int | None, error_location: (u, v) | None.
  • .to_dict(), .to_json().

Tracing a run

trace = trace_solution(puzzle, solution, *, cycle_limit=50)

Returns a TraceDocument with:

  • trace.summarycycle, complete, collision, outputs, cycle_limit, reason.
  • trace.frames[i] — per-cycle snapshot: cycle, complete, collision, outputs, atoms, arms, collision_reason, collision_position.
  • frame.atoms[j]u, v, atom_type, normal_bonds, grabbed, van_berlo, .position.
  • frame.arms[j]base_u, base_v, rotation, grabbing, kind, .base.

For most debugging, use the Board API below (Board.from_trace(trace, cycle)) rather than raw frames. For frame.collision / frame.outputs scanning, iterate trace.frames directly. For zooming in on one cycle ▎ — arm tips, held-atom ids, glyph footprints, bond graphs — wrap it with Board.from_trace(trace, cycle).


Board — structured layout / snapshot

Board.from_trace(trace, cycle)                # snapshot at a cycle

Attributes (all precomputed):

  • cycle: int | None, outputs_delivered: int, outputs_required: int
  • runtime_collision: bool, runtime_collision_reason, runtime_collision_position
  • arms: tuple[Arm, ...], glyphs: tuple[Glyph, ...], atoms: tuple[Atom, ...]

Prompt summary:

  • board.describe() -> str — compact text dump of the layout (good for agent context).

Dataclasses

  • Atom: id, position, atom_type, grabbed_by, bonds, van_berlo, on_glyph. Equality is by id.
  • Arm: index, kind, base, rotation, length, tip, tips, grabbing, program.
  • Glyph: kind, position, rotation, footprint, activation_hexes: dict[str, (u, v)].

Spatial / planning helpers

hex_add(a, b)                              # (u, v) + (u, v)
hex_sub(a, b)
hex_dir(direction)                         # unit offset for direction 0..5
hex_neighbors((u, v))                      # all 6
hex_distance(a, b)                         # axial hex distance
hex_transform(offset, rotation)            # rotate a local footprint offset
hex_transform_position(position, offset, rotation)

Arm placement:

arm_grab_pos(base, rotation, length=1) -> (u, v)
arm_positions_for_target(target, *, length=1) -> [(base, rotation), ...]  # 6 entries
arm_payload_positions(base, rotation, payload_offsets, length=1) -> [(u, v), ...]

Overlap (use this first when a solution fails with overlapping placements):

PART_FOOTPRINTS["bonder"]                  # local offsets for each part type
part_occupied_cells(part_type, position, rotation=0) -> [(u, v), ...]
check_placement_overlaps([(part_type, position, rotation), ...])
# -> [] if no overlaps, else [(cell, [part_indices]), ...]

Glyph activation and simulation

glyph_footprint(glyph_type, position=(0,0), rotation=0) -> [(u, v), ...]
glyph_activation_hexes(glyph_type, position=(0,0), rotation=0) -> dict
simulate_glyph(
    glyph_type, glyph_position, glyph_rotation,
    atoms=[{"position": (u, v), "element": "iron", "bonds": [(u2, v2), ...], "grabbed": False, "van_berlo": False}, ...],
)
# -> {"activated": bool, "result_atoms": [...], "explanation": str}

Use this offline to predict whether a candidate atom arrangement will trigger a glyph before running the full verifier.


Constants

  • DEFAULT_CYCLE_LIMIT — passed to verify_solution / trace_solution by default.
  • scoring_profile (kwarg to verify_solution / safe_verify, or --scoring-profile CLI flag on verify_task.py) selects reward formula: sum-scores (cost+cycles+area), sum-cycle-area, solved-binary. Default is sum-scores.

Typical iterative loop

b = empty_solution("puzzle.puzzle")
# ... add parts, program arms ...

result = safe_verify("puzzle.puzzle", b)
if not result.valid:
    # Overlap? static check — no simulation required.
    overlaps = check_placement_overlaps([(kind, pos, rot) for ...])
    # Collision / cycle-limit / missing-products? inspect the trace.
    trace = trace_solution("puzzle.puzzle", b, cycle_limit=30)
    board = Board.from_trace(trace, result.error_cycle or trace.summary.cycle)
    # inspect board around result.error_location

Debug helpers (one-shot CLIs)

Run these from a bash step — no setup, no ad-hoc scripts. Each is also importable (from opus_magnum_bench import explain_failure, layout_check, motion_preview).

# Why did my solution fail? Post-mortem from a VerifyResult — dispatches on
# collision / overlap / cycle-limit / missing-products and prints a focused report.
uv run python -m opus_magnum_bench.explain_failure puzzle.puzzle solution.solution

# Did I overlap any parts? Can each arm reach each output? Static — no simulation.
# Passing the puzzle is optional; if given, arm-reach is reported per output station.
uv run python -m opus_magnum_bench.layout_check solution.solution puzzle.puzzle

# Where are the arms + held atoms cycle-by-cycle? Stops one cycle past any collision.
uv run python -m opus_magnum_bench.motion_preview puzzle.puzzle solution.solution --cycles 30

======================================================================

TURN 1 — AGENT (1391 chars)

====================================================================== print("Starting solving the puzzle...")

Let's analyze the puzzle and design the arm programs

We need to produce 6 outputs of Product 0 (salt x2 and normal bond)

We have two calcification glyphs that can transform elements into salt

First, we need to grab the reagents from the input

The input is at (0, 0) and has atoms: fire@(0, 0), air@(1, 0) and bond (0, 0)-(1, 0)

We need to use the calcification glyphs to transform elements into salt

The calcification glyph at (1, -1) and (2, -2) can be used

We will use Arm 1 to grab the reagents and perform the transformation

We will use Arm 2 to handle the bonding

Let's design the arm programs

arm_1_prog = [ ("grab", 0), # Grab the reagents from the input ("rotate_cw", 1), # Rotate to position the arm for the first calcification glyph ("drop", 2), # Drop the reagents at the first calcification glyph ("repeat", 3), # Repeat the cycle to produce multiple outputs ]

arm_2_prog = [ ("grab", 0), # Grab the reagents from the first calcification glyph ("rotate_cw", 1), # Rotate to position the arm for the second calcification glyph ("drop", 2), # Drop the reagents at the second calcification glyph ("repeat", 3), # Repeat the cycle to produce multiple outputs ]

arm_programs = {1: arm_1_prog, 2: arm_2_prog}

Submit the arm programs for verification

safe_verify(arm_programs)


TURN 1 — REPL OBSERVATION (101 chars)


ERROR: no ```python fenced code block found in your response. Emit exactly one python block per turn.

======================================================================

TURN 2 — AGENT (1405 chars)

======================================================================

print("Starting solving the puzzle...")

# Let's analyze the puzzle and design the arm programs
# We need to produce 6 outputs of Product 0 (salt x2 and normal bond)
# We have two calcification glyphs that can transform elements into salt

# First, we need to grab the reagents from the input
# The input is at (0, 0) and has atoms: fire@(0, 0), air@(1, 0) and bond (0, 0)-(1, 0)

# We need to use the calcification glyphs to transform elements into salt
# The calcification glyph at (1, -1) and (2, -2) can be used

# We will use Arm 1 to grab the reagents and perform the transformation
# We will use Arm 2 to handle the bonding

# Let's design the arm programs
arm_1_prog = [
    ("grab", 0),  # Grab the reagents from the input
    ("rotate_cw", 1),  # Rotate to position the arm for the first calcification glyph
    ("drop", 2),  # Drop the reagents at the first calcification glyph
    ("repeat", 3),  # Repeat the cycle to produce multiple outputs
]

arm_2_prog = [
    ("grab", 0),  # Grab the reagents from the first calcification glyph
    ("rotate_cw", 1),  # Rotate to position the arm for the second calcification glyph
    ("drop", 2),  # Drop the reagents at the second calcification glyph
    ("repeat", 3),  # Repeat the cycle to produce multiple outputs
]

arm_programs = {1: arm_1_prog, 2: arm_2_prog}

# Submit the arm programs for verification
safe_verify(arm_programs)

TURN 2 — REPL OBSERVATION (30 chars)


Starting solving the puzzle...

======================================================================

TURN 3 — AGENT (1011 chars)

======================================================================

# Let's analyze the error and improve our approach
# The initial approach didn't account for the bonding process
# We need to use the calcification glyphs to transform elements into salt
# Then, we need to use the bonding process to create the normal bond

# Let's redesign the arm programs
arm_1_prog = [
    ("grab", 0),  # Grab the reagents from the input
    ("rotate_cw", 1),  # Rotate to position the arm for the first calcification glyph
    ("drop", 2),  # Drop the reagents at the first calcification glyph
    ("repeat", 3),  # Repeat the cycle to produce multiple outputs
]

arm_2_prog = [
    ("grab", 0),  # Grab the reagents from the first calcification glyph
    ("rotate_cw", 1),  # Rotate to position the arm for the bonding process
    ("drop", 2),  # Drop the reagents at the bonding position
    ("repeat", 3),  # Repeat the cycle to produce multiple outputs
]

arm_programs = {1: arm_1_prog, 2: arm_2_prog}

# Submit the arm programs for verification
safe_verify(arm_programs)

TURN 3 — REPL OBSERVATION (461 chars)


Your code ran without error but produced no printed output. The REPL does NOT auto-print expressions — you must call print(...) explicitly. For feedback, try: print(arm_numbers) # which arms exist print(safe_verify(arm_programs)) # does this dict validate? Calling submit() without first using safe_verify() usually fails. submit() no longer ends the episode on failure; it returns the verifier dict so you can iterate.

======================================================================

TURN 4 — AGENT (123 chars)

======================================================================

# Let's print the arm numbers and verify the arm programs
print(arm_numbers)
print(safe_verify(arm_programs))

TURN 4 — REPL OBSERVATION (92 chars)


[0] {'solved': False, 'valid': False, 'error': 'ValueError: unknown arm_number 1; have [0]'}

======================================================================

TURN 5 — AGENT (569 chars)

======================================================================

# The arm_numbers list contains only arm 0, so we need to adjust our approach
# Let's use arm 0 for our programs

# Let's redesign the arm programs using arm 0
arm_0_prog = [
    ("grab", 0),  # Grab the reagents from the input
    ("rotate_cw", 1),  # Rotate to position the arm for the first calcification glyph
    ("drop", 2),  # Drop the reagents at the first calcification glyph
    ("repeat", 3),  # Repeat the cycle to produce multiple outputs
]

arm_programs = {0: arm_0_prog}

# Submit the arm programs for verification
safe_verify(arm_programs)

TURN 5 — REPL OBSERVATION (461 chars)


Your code ran without error but produced no printed output. The REPL does NOT auto-print expressions — you must call print(...) explicitly. For feedback, try: print(arm_numbers) # which arms exist print(safe_verify(arm_programs)) # does this dict validate? Calling submit() without first using safe_verify() usually fails. submit() no longer ends the episode on failure; it returns the verifier dict so you can iterate.

======================================================================

TURN 6 — AGENT (123 chars)

======================================================================

# Let's print the arm numbers and verify the arm programs
print(arm_numbers)
print(safe_verify(arm_programs))

TURN 6 — REPL OBSERVATION (177 chars)


[0] {'valid': True, 'solved': True, 'reward': 0.4032258064516129, 'overlap': 0, 'cost': 40, 'cycles': 96, 'area': 12, 'error': None, 'error_cycle': None, 'error_location': None}

======================================================================

TURN 7 — AGENT (105 chars)

======================================================================

# The solution is valid and solved! We can now submit the arm programs
submit(arm_programs)

TURN 7 — REPL OBSERVATION (190 chars)


[submit] SOLVED: {'valid': True, 'solved': True, 'reward': 0.4032258064516129, 'overlap': 0, 'cost': 40, 'cycles': 96, 'area': 12, 'error': None, 'error_cycle': None, 'error_location': None}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment