Skip to content

Instantly share code, notes, and snippets.

@Helw150
Helw150 / marin-storage-report.md
Created May 26, 2026 20:01
Marin storage report — 2026-05-26 20:01 UTC

GCS Storage Report

Generated: 2026-05-26T20:00:58Z

Overview

Metric Value
Total Objects 321.4M
Total Size 3,049.70 TB
@Helw150
Helw150 / datakit_moe_mix_buckets_us_central2.csv
Created May 25, 2026 22:32
datakit MoE mixture bucket sizes (us-central2 store_8ac06c74)
bucket tokens epochs_for_8T epochs_for_2T
c01q0 152613773716 52.4199 13.105
c01q1 137672798475 58.1088 14.5272
c01q2 158419310113 50.4989 12.6247
c01q3 190944457285 41.897 10.4743
c01q4 516406017210 15.4917 3.8729
c02q0 17678211181 452.5345 113.1336
c02q1 22467740659 356.0661 89.0165
c02q2 54895261784 145.7321 36.433
c02q3 149886381691 53.3738 13.3434
@Helw150
Helw150 / marin-storage-report.md
Created May 23, 2026 00:30
Marin storage report — 2026-05-23 00:30 UTC

GCS Storage Report

Generated: 2026-05-23T00:30:00Z

Overview

Metric Value
Total Objects 338.6M
Total Size 3,110.95 TB
@Helw150
Helw150 / marin-storage-report.md
Created May 19, 2026 05:04
Marin storage report — 2026-05-19 05:04 UTC

GCS Storage Report

Generated: 2026-05-19T05:03:56Z

Overview

Metric Value
Total Objects 233.3M
Total Size 3,010.88 TB
@Helw150
Helw150 / marin-storage-report.md
Created May 19, 2026 03:52
Marin storage report — 2026-05-19 03:51 UTC

GCS Storage Report

Generated: 2026-05-19T03:51:40Z

Overview

Metric Value
Total Objects 231.0M
Total Size 3,009.25 TB
@Helw150
Helw150 / marin-storage-report.md
Created May 19, 2026 02:55
Marin storage report — 2026-05-19 02:55 UTC

GCS Storage Report

Generated: 2026-05-19T02:55:18Z

Overview

Metric Value
Total Objects 364.9M
Total Size 5,463.48 TB
@Helw150
Helw150 / marin-storage-report.md
Created May 19, 2026 01:23
Marin storage report — 2026-05-19 01:23 UTC

GCS Storage Report

Generated: 2026-05-19T01:23:31Z

Overview

Metric Value
Total Objects 663.2M
Total Size 5,534.42 TB
@Helw150
Helw150 / grug_results_summary.json
Last active May 13, 2026 20:25
Grug MoE data-mixture comparison (v0/v2/v3/v4) across compute scales — full lm-eval results across 17 tasks
[
{
"mix": "v0",
"hidden_dim": 512,
"budget": 2.19e+17,
"tasks": {
"mmlu_sl_verb_0shot": {
"bpb,none": 0.844007877090332,
"bpb_stderr,none": 0.004281199742552705,
"acc_norm,none": 0.27332288847742486,
@Helw150
Helw150 / grp_pipeline_300m.py
Last active May 7, 2026 01:46
P3 mixture-modeling pipeline + GRP head-to-head comparison on the 300M / 6.3T sweep
import marimo
__generated_with = "0.21.0"
app = marimo.App(width="full")
@app.cell
def _():
import marimo as mo
import numpy as np
@Helw150
Helw150 / mixture_300m_irt.py
Created May 1, 2026 02:45
300M data-mixture IRT analysis: non-negative k-factor model with Horn k-selection, noise-floor-anchored ψ, and SNR diagnostics
import marimo
__generated_with = "0.21.0"
app = marimo.App(width="full")
@app.cell
def _():
import marimo as mo
import numpy as np