ryan-williams/marin#1773.md

Last active November 8, 2025 18:11

Star (0) You must be signed in to star a gist
Fork (0) You must be signed in to fork a gist

Select an option

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/ryan-williams/857fcaa8b2f80a250a70ac0250634ee5.js"></script>
Save ryan-williams/857fcaa8b2f80a250a70ac0250634ee5 to your computer and use it in GitHub Desktop.

Download ZIP

marin-community/marin#1773 - 2-way sync via ghpr (https://github.com/runsascoded/ghpr)

Raw

marin#1773.md

marin-community/marin#1773 "Workspace" monorepo plan

(Versioned write-up here, synced via ghpr)

Status

✅ #1690 (step 1): Init workspace, move marin to lib/marin/
✅ #1723 (step 2): Ingest Levanter as lib/levanter/
🚧 Step 3: Ingest Haliax as lib/haliax/

flowchart TB
    subgraph " "
        experiments["<b>experiments</b><br/><small>step 1</small>"]
        data_browser["<b>data_browser</b><br/><small>independent</small>"]
    end

    subgraph "lib/"
        marin["<b>marin ✅</b><br/><small>step 1</small>"]
        levanter["<b>levanter ✅</b><br/><small>step 2</small>"]
        haliax["<b>haliax 🚧</b><br/><small>step 3</small>"]
        thalas["<b>thalas 📋</b><br/><small>step 4</small>"]
        zephyr["<b>zephyr ✅</b><br/><small><a href='https://github.com/marin-community/marin/pull/1646'>#1646</a></small>"]
    end

    experiments --> marin
    experiments --> levanter
    experiments --> haliax
    experiments --> zephyr

    marin --> levanter
    marin --> zephyr

    levanter --> haliax

    style experiments fill:#d4edda,color:#000
    style marin fill:#d4edda,color:#000
    style levanter fill:#d4edda,color:#000
    style zephyr fill:#d4edda,color:#000
    style haliax fill:#fff3cd,color:#000
    style thalas fill:#f8d7da,color:#000
    style data_browser fill:#e2e3e5,color:#000

    classDef completed fill:#d4edda,stroke:#28a745,stroke-width:2px,color:#000
    classDef inProgress fill:#fff3cd,stroke:#ffc107,stroke-width:2px,color:#000
    classDef planned fill:#f8d7da,stroke:#dc3545,stroke-width:2px,stroke-dasharray: 5 5,color:#000
    classDef independent fill:#e2e3e5,stroke:#6c757d,stroke-width:2px,color:#000

Legend:

✅ Completed & merged
🚧 In progress
📋 Planned

(data_browser stays independent, not a workspace member)

Problem

Marin and Levanter repos contain components that depend on one another in ways that the current repo split doesn't reflect well, and makes awkward for co-development.

Proposed solution: `uv` workspaces

"Workspaces" provide a way to colocate distinct libraries in one repo, such that they can be published and depended on independently (by external users), but naturally depend on each others' HEAD commits (and can easily be updated in lockstep, during common internal / co-development cases).

Implementation Plan

Below is a rough sequence of steps to get there, with the goal of minimizing disruption along the way.

Workspace migration scripts provide hermetic replay of the steps below on top of arbitrary Marin/Levanter main commits, which helps avoid conflicts while developing, and is more legible for review than the huge PR patches it generates.

Step 1: init workspace, `marin` member (#1690)

 marin/
   pyproject.toml  # Workspace root (experiments/ become workspace root member)
   experiments/    # Becomes part of workspace root member
-  src/            # Move to lib/marin/
+  lib/
+    marin/
+      pyproject.toml
+      src/

Note: data_browser stays independent (separate deps/venv, excluded from workspace).

Step 2: Levanter member (#1723)

 marin/
   pyproject.toml
   experiments/
   lib/
     marin/
       pyproject.toml
       src/
+    levanter/
+      pyproject.toml
+      src/

Additional notes:

This will require namespacing GHA .ymls with levanter- and marin- prefixes, to distinguish them.
We'll also want to path-restrict GHAs to only run on relevant changes.

Step 3: Haliax member

 marin/
   pyproject.toml
   experiments/
   lib/
+    haliax/
+      pyproject.toml
+      src/
     levanter/
       pyproject.toml
       src/
     marin/
       pyproject.toml
       src/

Step 4: "Thalas" (executor) member

Thalas was an attempt at factoring Marin's executor code out as a separate library (and repo).

The new plan is to make it a workspace member in the new workspace repo, instead:

 marin/
   pyproject.toml
   experiments/
   lib/
     haliax/
       pyproject.toml
       src/
     levanter/
       pyproject.toml
       src/
     marin/
       pyproject.toml
       src/
+    thalas/
+      pyproject.toml
+      src/

Step Omega: `ray_tpu`, `rl`, `marin-core`, `marin-crawl`, `experiments` packages

 marin/
   pyproject.toml
   experiments/
+    hero_runs/
+      pyproject.toml
+      expXXX_tootsie8b.py
+    compel/
+      pyproject.toml
+      expXXX_compel_v0.py
   lib/
-    marin/
-      pyproject.toml
-      src/
+    marin-core/
+      pyproject.toml
+      src/
     haliax/
       pyproject.toml
       src/
     levanter/
       pyproject.toml
       src/
+    marin-crawl/
+      pyproject.toml
+      src/
+    ray_tpu/
+      pyproject.toml
+      src/
+    rl/
+      pyproject.toml
+      src/
     thalas/
       pyproject.toml
       src/

Raw

z3404494861-ryan-williams.md

I'm actually thinking "step 1" should be:

init workspace
lib/marin/ member (as planned)
data_browser/ member (leave in place, don't move to lib/data_browser/)
- mv data_browser lib/ later, minimize churn now
experiments/ member (Leave in place, minimal churn; just add experiments/pyproject.toml)

This way, experiments member will depend on marin member. #1690 as written leaves experiment srcs "loose" in the workspace root (which then depends on marin member). Explicitly modeling experiments → marin gives us a good test of intra-workspace deps, and feels more idiomatic.

Raw

z3407382913-ryan-williams.md

2 new thoughts:

We have to move experiments files either way
- If experiments/ becomes a "member", imports like from experiments.… will break.
- Moving to lib/experiments/ for "step 1" probably makes sense?
- lib/experiments/src/experiments/ and lib/marin/src/marin/ feel unwieldy, but maybe best we can do?
Better to leave data_browser as-is, not make it a workspace member.
- It conceptually doesn't/shouldn't share a venv with {marin,experiments,levanter,…}, but workspace enforces/models that.

Raw

z3407444000-ryan-williams.md

Actually, the existing "step 1" / #1690 accidentally did something good here, by leaving experiments as part of the workspace root.

We were effectively modeling the experiments → marin dep, without having to move experiments code or change imports.

I'm now working on just removing the data_browser changes from #1690.

Raw

z3407713637-dlwh.md

ok sure!

Raw

z3478991552-ryan-williams.md

🎉 CI passing on steps 1 & 2

#1690: "Workspace" step 1: experiments → marin
#1723: "Workspace" step 2: experiments → marin → levanter

test_full_integration_moar_cats also passes for me locally:

START_RAY_CPU_CLUSTER=true uv run --package marin pytest tests/rl/integration/test_cats_integration.py::test_full_integration_moar_cats -sv

I think #1690 and #1723 are fine to merge, but want to mention some possible rough edges below.

We've also done more SHA-pinning in the interim (e.g. #1850), which is the main alternative to a "workspace" that I can imagine. Some discussion of that follows as well.

Related (Merged) PRs

Marin

#1786: Replace from src.marin imports with from marin
#1802: Rm dev_tpu.py "src" imports using cloudpickle registration
#1850: Bump levanter+dolma SHAs, pin lm-eval, scalax SHAs

Levanter

#1275: Upgrade transformers from 4.57.0 (yanked) to 4.57.1
#1278: haliax>=1.4.dev450
#1280: TPU test vs. JAX 0.{6,7}.2, fix tests on 0.6.2

Dolma

#2: Relax tokenizers constraint to support 0.21-0.22 (again)

🤔 Possible downsides of workspace monorepo

😮‍💨 Longer relative paths: `src/marin/…` → `lib/marin/src/marin/…`

Also not "end of world", feels a bit worse than today, but maybe fine / equivalent to having to select between repos today.

@percyliang expressed similar on #1690.

🫩 `uv [sync|run] --package [marin|levanter]`

We now have to specify --package [marin|levanter] when running commands that should only affect one package.

Searching the #1723 branch, you can see examples of this in GHA and RTD YMLs (e.g.) and deployment .shs.

😬 `.github/workflows/` crowding

$ tree .github/workflows/
.github/workflows/
├── levanter-check_lockfile.yaml
├── levanter-docker-base-image.yaml
├── levanter-docker-cluster-image.yaml
├── levanter-gpt2_small_itest.yaml
├── levanter-launch_small_fast.yaml
├── levanter-publish_dev.yaml
├── levanter-run_entry_tests.yaml
├── levanter-run_pre_commit.yaml
├── levanter-run_ray_tests.yaml
├── levanter-run_tests.yaml
├── levanter-tpu_unit_tests.yaml
├── marin-build-docker-images.yaml
├── marin-codeql.yml
├── marin-docs.yaml
├── marin-lint-and-format.yaml
├── marin-metrics.yaml
├── marin-quickstart.yaml
├── marin-unit-tests.yaml
└── marin-update-leaderboard.yml

1 directory, 19 files

Will this be annoying when we add executor, Haliax, etc.?

🧐 RTD builds per package

Levanter's RTD build needs to be updated to build from Marin repo's lib/levanter. This seems easy from https://app.readthedocs.org/dashboard/levanter/edit/:

I guess it's fine in principle to have N different libraries building+publishing RTDs from subdirs of the Marin repo.

Marin → Levanter SHA pins

Examining Marin's Levanter dependency, since it switched to SHA pins:

2025-10-17:
- marin#1794: Fix RL integration tests and testing experiment
- → 2ccbf007 (levanter#1191: Add plan for multi-host inference support)
- Marin moved to SHA-pinning Levanter here. Not clear these PRs were directly linked; more likely Marin needed something from a previous Levanter PR (or the floating main dep had caused issues).
2025-10-18:
- marin#1774: Hyperball speedrun
- → c30de5b9 (levanter#1253: Hyperball optimizers)
- Straightforward "co-developed" PRs.
2025-11-01:
- marin#1850: Bump levanter+dolma SHAs, pin lm-eval, scalax SHAs
- → 6cd783c8 (levanter#1288: Pin trackio>=0.5.0 for consistency with existing uv.lock)
- This bump brought in several Levanter PRs I needed in Marin, all at once.

This is a slower cadence of "paired updates" than I expected. I wonder if it's worth observing development patterns for a bit longer, to get a better sense of tradeoffs?

@rjpower previously mentioned situations where Ray incorporated conflicting or wrong Levanter versions, even when Marin pins a specific SHA. It might be worth trying to run down an example of that (or fix the root cause, in Ray or our use of it).

Raw

z3488446921-ryan-williams.md

I merged #1858 with fast-follows to #1723:

✅ Levanter - Pre-Commit GHA is fixed
✅ GHAs on main all green
✅ Levanter docs now published from lib/levanter (e.g. /Port-Models page includes this fix from #1858)

Some next steps:

Transfer Levanter issues to this repo
Steps 3/4 (Haliax, executor)

ryan-williams/marin#1773.md

Select an option

No results found

Select an option

No results found

marin-community/marin#1773 "Workspace" monorepo plan

Status

Problem

Proposed solution: `uv` workspaces

Implementation Plan

Step 1: init workspace, `marin` member (#1690)

Step 2: Levanter member (#1723)

Step 3: Haliax member

Step 4: "Thalas" (executor) member

Step Omega: `ray_tpu`, `rl`, `marin-core`, `marin-crawl`, `experiments` packages

🎉 CI passing on steps 1 & 2

Marin

Levanter

Dolma

🤔 Possible downsides of workspace monorepo

😮‍💨 Longer relative paths: `src/marin/…` → `lib/marin/src/marin/…`

🫩 `uv [sync|run] --package [marin|levanter]`

😬 `.github/workflows/` crowding

🧐 RTD builds per package

Marin → Levanter SHA pins

ryan-williams/marin#1773.md

marin-community/marin#1773 "Workspace" monorepo plan

Status

Problem

Proposed solution: uv workspaces

Implementation Plan

Step 1: init workspace, marin member (#1690)

Step 2: Levanter member (#1723)

Step 3: Haliax member

Step 4: "Thalas" (executor) member

Step Omega: ray_tpu, rl, marin-core, marin-crawl, experiments packages

🎉 CI passing on steps 1 & 2

Marin

Levanter

Dolma

🤔 Possible downsides of workspace monorepo

😮‍💨 Longer relative paths: src/marin/… → lib/marin/src/marin/…

🫩 uv [sync|run] --package [marin|levanter]

😬 .github/workflows/ crowding

🧐 RTD builds per package

Marin → Levanter SHA pins

Proposed solution: `uv` workspaces

Step 1: init workspace, `marin` member (#1690)

Step Omega: `ray_tpu`, `rl`, `marin-core`, `marin-crawl`, `experiments` packages

😮‍💨 Longer relative paths: `src/marin/…` → `lib/marin/src/marin/…`

🫩 `uv [sync|run] --package [marin|levanter]`

😬 `.github/workflows/` crowding