Skip to content

Instantly share code, notes, and snippets.

@ryan-williams
Last active November 5, 2025 02:43
Show Gist options
  • Save ryan-williams/857fcaa8b2f80a250a70ac0250634ee5 to your computer and use it in GitHub Desktop.
Save ryan-williams/857fcaa8b2f80a250a70ac0250634ee5 to your computer and use it in GitHub Desktop.

marin-community/marin#1773 "Workspace" monorepo plan

(Versioned write-up here, synced via ghpr)

Status

  • #1690 (step 1): Init workspace, move marin to lib/marin/
  • #1723 (step 2): Ingest Levanter as lib/levanter/
flowchart LR
    experiments[experiments]
    marin[lib/marin]
    levanter[lib/levanter]
    data_browser

    experiments --> marin
    experiments --> levanter
    marin --> levanter
Loading

(data_browser stays independent, not a workspace member)

Problem

Marin and Levanter repos contain components that depend on one another in ways that the current repo split doesn't reflect well, and makes awkward for co-development.

Proposed solution: uv workspaces

"Workspaces" provide a way to colocate distinct libraries in one repo, such that they can be published and depended on independently (by external users), but naturally depend on each others' HEAD commits (and can easily be updated in lockstep, during common internal / co-development cases).

Implementation Plan

Below is a rough sequence of steps to get there, with the goal of minimizing disruption along the way.

Workspace migration scripts provide hermetic replay of the steps below on top of arbitrary Marin/Levanter main commits, which helps avoid conflicts while developing, and is more legible for review than the huge PR patches it generates.

Step 1: init workspace, marin member (#1690)

 marin/
   pyproject.toml  # Workspace root (experiments/ become workspace root member)
   experiments/    # Becomes part of workspace root member
-  src/            # Move to lib/marin/
+  lib/
+    marin/
+      pyproject.toml
+      src/

Note: data_browser stays independent (separate deps/venv, excluded from workspace).

Step 2: Levanter member (#1723)

 marin/
   pyproject.toml
   experiments/
   lib/
     marin/
       pyproject.toml
       src/
+    levanter/
+      pyproject.toml
+      src/

Additional notes:

  • This will require namespacing GHA .ymls with levanter- and marin- prefixes, to distinguish them.
  • We'll also want to path-restrict GHAs to only run on relevant changes.

Step 3: "Thalas" (executor) member

Thalas was an attempt at factoring Marin's executor code out as a separate library (and repo).

The new plan is to make it a workspace member in the new workspace repo, instead:

 marin/
   pyproject.toml
   experiments/
   lib/
     marin/
       pyproject.toml
       src/
     levanter/
       pyproject.toml
       src/
+    thalas/
+      pyproject.toml
+      src/

Step 4: Haliax member

 marin/
   pyproject.toml
   experiments/
   lib/
     marin/
       pyproject.toml
       src/
+    haliax/
+      pyproject.toml
+      src/
     levanter/
       pyproject.toml
       src/
     thalas/
       pyproject.toml
       src/

Step Omega: ray_tpu, rl, marin-core, marin-crawl, experiments packages

 marin/
   pyproject.toml
   experiments/
+    hero_runs/
+      pyproject.toml
+      expXXX_tootsie8b.py
+    compel/
+      pyproject.toml
+      expXXX_compel_v0.py
   lib/
-    marin/
-      pyproject.toml
-      src/
+    marin-core/
+      pyproject.toml
+      src/
     haliax/
       pyproject.toml
       src/
     levanter/
       pyproject.toml
       src/
+    marin-crawl/
+      pyproject.toml
+      src/
+    ray_tpu/
+      pyproject.toml
+      src/
+    rl/
+      pyproject.toml
+      src/
     thalas/
       pyproject.toml
       src/

I'm actually thinking "step 1" should be:

  1. init workspace
  2. lib/marin/ member (as planned)
  3. data_browser/ member (leave in place, don't move to lib/data_browser/)
    • mv data_browser lib/ later, minimize churn now
  4. experiments/ member (Leave in place, minimal churn; just add experiments/pyproject.toml)

This way, experiments member will depend on marin member. #1690 as written leaves experiment srcs "loose" in the workspace root (which then depends on marin member). Explicitly modeling experimentsmarin gives us a good test of intra-workspace deps, and feels more idiomatic.

2 new thoughts:

  1. We have to move experiments files either way
    • If experiments/ becomes a "member", imports like from experiments.… will break.
    • Moving to lib/experiments/ for "step 1" probably makes sense?
    • lib/experiments/src/experiments/ and lib/marin/src/marin/ feel unwieldy, but maybe best we can do?
  2. Better to leave data_browser as-is, not make it a workspace member.
    • It conceptually doesn't/shouldn't share a venv with {marin,experiments,levanter,…}, but workspace enforces/models that.

Actually, the existing "step 1" / #1690 accidentally did something good here, by leaving experiments as part of the workspace root.

We were effectively modeling the experimentsmarin dep, without having to move experiments code or change imports.

I'm now working on just removing the data_browser changes from #1690.

🎉 CI passing on steps 1 & 2

  • #1690: "Workspace" step 1: experimentsmarin
  • #1723: "Workspace" step 2: experimentsmarinlevanter

test_full_integration_moar_cats also passes for me locally:

START_RAY_CPU_CLUSTER=true uv run --package marin pytest tests/rl/integration/test_cats_integration.py::test_full_integration_moar_cats -sv

I think #1690 and #1723 are fine to merge, but want to mention some possible rough edges below.

We've also done more SHA-pinning in the interim (e.g. #1850), which is the main alternative to a "workspace" that I can imagine. Some discussion of that follows as well.

Related (Merged) PRs

Marin

  • #1786: Replace from src.marin imports with from marin
  • #1802: Rm dev_tpu.py "src" imports using cloudpickle registration
  • #1850: Bump levanter+dolma SHAs, pin lm-eval, scalax SHAs

Levanter

  • #1275: Upgrade transformers from 4.57.0 (yanked) to 4.57.1
  • #1278: haliax>=1.4.dev450
  • #1280: TPU test vs. JAX 0.{6,7}.2, fix tests on 0.6.2

Dolma

  • #2: Relax tokenizers constraint to support 0.21-0.22 (again)

🤔 Possible downsides of workspace monorepo

😮‍💨 Longer relative paths: src/marin/…lib/marin/src/marin/…

Also not "end of world", feels a bit worse than today, but maybe fine / equivalent to having to select between repos today.

@percyliang expressed similar on #1690.

🫩 uv [sync|run] --package [marin|levanter]

We now have to specify --package [marin|levanter] when running commands that should only affect one package.

Searching the #1723 branch, you can see examples of this in GHA and RTD YMLs (e.g.) and deployment .shs.

😬 .github/workflows/ crowding

$ tree .github/workflows/
.github/workflows/
├── levanter-check_lockfile.yaml
├── levanter-docker-base-image.yaml
├── levanter-docker-cluster-image.yaml
├── levanter-gpt2_small_itest.yaml
├── levanter-launch_small_fast.yaml
├── levanter-publish_dev.yaml
├── levanter-run_entry_tests.yaml
├── levanter-run_pre_commit.yaml
├── levanter-run_ray_tests.yaml
├── levanter-run_tests.yaml
├── levanter-tpu_unit_tests.yaml
├── marin-build-docker-images.yaml
├── marin-codeql.yml
├── marin-docs.yaml
├── marin-lint-and-format.yaml
├── marin-metrics.yaml
├── marin-quickstart.yaml
├── marin-unit-tests.yaml
└── marin-update-leaderboard.yml

1 directory, 19 files

Will this be annoying when we add executor, Haliax, etc.?

🧐 RTD builds per package

Levanter's RTD build needs to be updated to build from Marin repo's lib/levanter. This seems easy from https://app.readthedocs.org/dashboard/levanter/edit/:

I guess it's fine in principle to have N different libraries building+publishing RTDs from subdirs of the Marin repo.

Marin → Levanter SHA pins

Examining Marin's Levanter dependency, since it switched to SHA pins:

  • 2025-10-17:
    • marin#1794: Fix RL integration tests and testing experiment
    • 2ccbf007 (levanter#1191: Add plan for multi-host inference support)
    • Marin moved to SHA-pinning Levanter here. Not clear these PRs were directly linked; more likely Marin needed something from a previous Levanter PR (or the floating main dep had caused issues).
  • 2025-10-18:
  • 2025-11-01:
    • marin#1850: Bump levanter+dolma SHAs, pin lm-eval, scalax SHAs
    • 6cd783c8 (levanter#1288: Pin trackio>=0.5.0 for consistency with existing uv.lock)
    • This bump brought in several Levanter PRs I needed in Marin, all at once.

This is a slower cadence of "paired updates" than I expected. I wonder if it's worth observing development patterns for a bit longer, to get a better sense of tradeoffs?

@rjpower previously mentioned situations where Ray incorporated conflicting or wrong Levanter versions, even when Marin pins a specific SHA. It might be worth trying to run down an example of that (or fix the root cause, in Ray or our use of it).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment