marin-community/marin#1773 "Workspace" monorepo plan
(Versioned write-up here, synced via ghpr)
- โ
#1690 (step 1): Init workspace, move
marintolib/marin/ - โ
#1723 (step 2): Ingest Levanter as
lib/levanter/ - ๐ง Step 3: Ingest Haliax as
lib/haliax/
flowchart TB
subgraph " "
experiments["<b>experiments</b><br/><small>step 1</small>"]
data_browser["<b>data_browser</b><br/><small>independent</small>"]
end
subgraph "lib/"
marin["<b>marin โ
</b><br/><small>step 1</small>"]
levanter["<b>levanter โ
</b><br/><small>step 2</small>"]
haliax["<b>haliax ๐ง</b><br/><small>step 3</small>"]
thalas["<b>thalas ๐</b><br/><small>step 4</small>"]
zephyr["<b>zephyr โ
</b><br/><small><a href='https://github.com/marin-community/marin/pull/1646'>#1646</a></small>"]
end
experiments --> marin
experiments --> levanter
experiments --> haliax
experiments --> zephyr
marin --> levanter
marin --> zephyr
levanter --> haliax
style experiments fill:#d4edda,color:#000
style marin fill:#d4edda,color:#000
style levanter fill:#d4edda,color:#000
style zephyr fill:#d4edda,color:#000
style haliax fill:#fff3cd,color:#000
style thalas fill:#f8d7da,color:#000
style data_browser fill:#e2e3e5,color:#000
classDef completed fill:#d4edda,stroke:#28a745,stroke-width:2px,color:#000
classDef inProgress fill:#fff3cd,stroke:#ffc107,stroke-width:2px,color:#000
classDef planned fill:#f8d7da,stroke:#dc3545,stroke-width:2px,stroke-dasharray: 5 5,color:#000
classDef independent fill:#e2e3e5,stroke:#6c757d,stroke-width:2px,color:#000
Legend:
- โ Completed & merged
- ๐ง In progress
- ๐ Planned
(data_browser stays independent, not a workspace member)
Marin and Levanter repos contain components that depend on one another in ways that the current repo split doesn't reflect well, and makes awkward for co-development.
Proposed solution: uv workspaces
"Workspaces" provide a way to colocate distinct libraries in one repo, such that they can be published and depended on independently (by external users), but naturally depend on each others' HEAD commits (and can easily be updated in lockstep, during common internal / co-development cases).
Below is a rough sequence of steps to get there, with the goal of minimizing disruption along the way.
Workspace migration scripts provide hermetic replay of the steps below on top of arbitrary Marin/Levanter main commits, which helps avoid conflicts while developing, and is more legible for review than the huge PR patches it generates.
Step 1: init workspace, marin member (#1690)
marin/
pyproject.toml # Workspace root (experiments/ become workspace root member)
experiments/ # Becomes part of workspace root member
- src/ # Move to lib/marin/
+ lib/
+ marin/
+ pyproject.toml
+ src/Note: data_browser stays independent (separate deps/venv, excluded from workspace).
marin/
pyproject.toml
experiments/
lib/
marin/
pyproject.toml
src/
+ levanter/
+ pyproject.toml
+ src/Additional notes:
- This will require namespacing GHA
.ymls withlevanter-andmarin-prefixes, to distinguish them. - We'll also want to path-restrict GHAs to only run on relevant changes.
Step 3: Haliax member
marin/
pyproject.toml
experiments/
lib/
+ haliax/
+ pyproject.toml
+ src/
levanter/
pyproject.toml
src/
marin/
pyproject.toml
src/Thalas was an attempt at factoring Marin's executor code out as a separate library (and repo).
The new plan is to make it a workspace member in the new workspace repo, instead:
marin/
pyproject.toml
experiments/
lib/
haliax/
pyproject.toml
src/
levanter/
pyproject.toml
src/
marin/
pyproject.toml
src/
+ thalas/
+ pyproject.toml
+ src/ marin/
pyproject.toml
experiments/
+ hero_runs/
+ pyproject.toml
+ expXXX_tootsie8b.py
+ compel/
+ pyproject.toml
+ expXXX_compel_v0.py
lib/
- marin/
- pyproject.toml
- src/
+ marin-core/
+ pyproject.toml
+ src/
haliax/
pyproject.toml
src/
levanter/
pyproject.toml
src/
+ marin-crawl/
+ pyproject.toml
+ src/
+ ray_tpu/
+ pyproject.toml
+ src/
+ rl/
+ pyproject.toml
+ src/
thalas/
pyproject.toml
src/