Skip to content

Instantly share code, notes, and snippets.

@csvance
Created May 12, 2026 04:28
Show Gist options
  • Select an option

  • Save csvance/cc74176d79be0a03b1e9f0caf6331db2 to your computer and use it in GitHub Desktop.

Select an option

Save csvance/cc74176d79be0a03b1e9f0caf6331db2 to your computer and use it in GitHub Desktop.
Kaimon.jl Skill
name kaimon-julia
description Workflow guide for driving Julia projects through the Kaimon MCP server. Load this skill whenever working with Julia code in any capacity, including reading, writing, editing, or debugging .jl files, Project.toml/Manifest.toml, Julia REPL sessions, Pkg operations, Revise-based hot-reload workflows, or any Kaimon tool calls (start_session, ex, manage_repl, pkg_add, pkg_rm, check_eval, debug_exfiltrate, goto_definition, workspace_symbols). Kaimon is the preferred execution path for all Julia work in this repo.

Kaimon + Julia REPL Workflow

A reference for working a Julia project through the Kaimon MCP server. Covers session lifecycle, output management, Revise behavior, and the rough edges that cost the most round trips when you hit them blind.

1. Session lifecycle

start_session(project_path="/workspace")        # returns 8-char session key, once
investigate_environment(session=<key>)           # confirms version + Revise + deps
ex(e="...", session=<key>)                       # all subsequent calls
  • One session, one key, every call. The key is returned on start_session and must be passed to every tool call that touches the REPL. There is no implicit "current session" when multiple are connected. Save the key the moment you get it.
  • investigate_environment first, always. It reports the Julia version, pwd, active project, installed packages, dev packages, and whether Revise is active. Free information that prevents wrong assumptions about what's loaded.
  • Don't Pkg.activate(...) from inside ex. The session already activates the project at start_session. Re-activating breaks subsequent pkg_add calls.
  • Pre-warm a tightly-pinned manifest. If the target project's Manifest.toml pins versions older than what Kaimon was built against (leaf deps like JSON, Preferences are common culprits), start_session will trigger a Kaimon recompile that then chases the missing transitive deps one at a time, failing the start until each is resolved. Run Pkg.update() in the target project before start_session, or vendor a manifest known to be compatible with Kaimon's build versions.

2. The q flag is a token-budget decision

Default is q=true (strip return value). Choose mechanically:

  • q=true (default): assignments, imports, definitions, anything whose return value is "the assigned thing" or nothing. This is 80% of calls.
  • q=false: you need the value to make a decision this turn (allocation count, equality check, type inspection).

The stdout rule is more aggressive than it looks: anything written to stdout is stripped, so println("debug: x = $x") vanishes. To surface a value, end the ex expression with the value itself and q=false:

ex(e="(length(result), typeof(result))", q=false)   # good
ex(e="println(result)", q=true)                      # invisible

@info survives the strip. It is the only reliable way to emit per-step progress from inside a long function and have the agent see it.

3. Large outputs: kill them at the source

Calling ex with q=false on a richly-printed value (a BenchmarkTools.Trial, a DataFrame, a large Dict) dumps the full pretty-print into the conversation and you pay for those tokens for the rest of the session.

Three layers of defense, cheapest first:

  1. End with ; nothing when you only care about the side effect of storing into a variable: ex(e="results = run_all(); nothing"). The trailing expression is nothing, so even q=false returns nothing.
  2. Write a small pretty-printer and call it from a separate ex. The function does its own controlled print(rpad(...)) and you shape the output exactly.
  3. max_output=15000 raises the truncation cap to 25 000 characters. Reach for it only when the output is intrinsically large and you actually need to read it.

s=true (silent mode, suppresses the agent> echo) exists but is rarely needed; the usage doc calls it out as a special case.

4. Revise: invisible until it isn't

Revise is loaded into the session before you get there. Revise.revise() is a no-op in this setup. The mental model:

  • Edits to any .jl file included from a tracked package module are picked up between ex calls automatically.
  • The "tracked package module" is whatever you using ... from the project. Files included from that module are tracked transitively.
  • Driver scripts go outside src/. A one-shot top-level script does not benefit from Revise tracking and adds overhead at using time.
  • Restart is cheap, but not always free. manage_repl(command="restart") preserves the session key. You lose in-memory variables but regain a clean world. Restart when:
    • You added or removed a struct field.
    • You changed module-level code (an include, a top-level const, an import).
    • You hit a MethodError or world-age error that persists after edits that should have fixed it.
    • A long-lived background task crashed and left module-level state stale. The classic case is a server in @async (Oxygen, HTTP.jl) whose route registry, scheduler table, or connection pool is now out of sync with the source. The code reloads fine, but the in-memory registry no longer matches it.

For function-body edits, which is 95% of normal work, restart is not needed. Save, run ex, observe the new behavior.

5. Batch eagerly, but not blindly

ex(e="x = 1; y = 2; z = f(x, y)") is one round trip; three separate ex calls is three. Batch when the commands form one logical step and you do not need an intermediate value to decide what comes next.

Do not batch when:

  • You need the value of the first call to decide the second (e.g., check length(result) before deciding whether to dump it).
  • One command might fail and you want to isolate the error.
  • The combined output would be large enough to truncate.

Rule of thumb: setup + warmup + smoke + check fit in one ex; benchmark runs and result dumps are separate calls.

6. pkg_add vs Pkg.add, and the stdlib trap

Use the pkg_add MCP tool, not Pkg.add(...) via ex. It is one tool call, modifies Project.toml atomically, and reports the resulting state cleanly. Same for pkg_rm.

Sharp edge: even Julia's standard library packages must be listed in Project.toml if the package module references them. The session REPL has stdlibs available, but a package being precompiled does not inherit the REPL's environment. The symptom looks like:

ArgumentError: Package MyPkg does not have LinearAlgebra in its dependencies

Fix: pkg_add(packages=["LinearAlgebra", "Random", "Printf"]) and re-using. Scan the using statements in src/ before the first using <YourPackage> and pre-add any stdlib they reference.

7. Cheap correctness gates before @benchmark

@benchmark is the slowest tool in the box. Define cheaper gates and run them between every meaningful edit:

  • A smoke_test()-style function: runs every kernel once, asserts cross-implementation equality. Microseconds.
  • An allocation_gate()-style function: runs each kernel after warmup, captures @allocated, raises if a kernel claiming zero allocations isn't. Microseconds.
  • The actual benchmark: seconds to minutes; run when you are confident the code is right and you specifically want a timing number.

Design these gates to return small, structured values (a NamedTuple, a Vector{Pair{Symbol,Int}}) so they fit on one screen with q=false. They are meant to be eyeballed each round.

8. Less-used flags worth knowing

  • mt=true: required for GLMakie / GLFW / OpenGL, which need thread 1. Irrelevant for pure-compute code. The moment you using GLMakie, every related ex needs mt=true. ThreadAssertionError from a plot library means you forgot it.
  • check_eval(eval_id="..."): every ex returns an eval_id. If a call times out, or you want to keep working while a long run completes, poll the result later via check_eval. The right tool for multi-minute runs.
  • debug_exfiltrate / Infiltrator: inspect state inside a function without modifying the call site. Useful when output is wrong and you want to see locals at the failing point. Often unnecessary if a correctness gate (Section 7) catches the bug at a boundary.
  • goto_definition, workspace_symbols: code navigation backed by Julia's own LanguageServer indexing. Prefer these over grep when working in a large existing codebase. Note: qdrant_search_code is also exposed by Kaimon but requires a Qdrant server, which is not provisioned in this environment, so it will fail at call time.

9. The single biggest mistake to avoid

Running ex with q=false on a call that returns a large or richly-printed value. Once the tool result is in your context, you pay for those tokens for the rest of the conversation. The cure costs nothing: end the expression with nothing, store into a variable, and make a separate small call to inspect only what you need.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment