Skip to content

Instantly share code, notes, and snippets.

@Helw150
Created May 26, 2026 20:01
Show Gist options
  • Select an option

  • Save Helw150/4bc6b8f73ccfd9a9d63c7c6421937b57 to your computer and use it in GitHub Desktop.

Select an option

Save Helw150/4bc6b8f73ccfd9a9d63c7c6421937b57 to your computer and use it in GitHub Desktop.
Marin storage report — 2026-05-26 20:01 UTC

GCS Storage Report

Generated: 2026-05-26T20:00:58Z

Overview

Metric Value
Total Objects 321.4M
Total Size 3,049.70 TB
Est. Monthly Cost $25,869
Annual Estimate $310,431

By Bucket

Bucket Region Objects Size (TB) Monthly Cost
marin-us-east5 US-EAST5 28.6M 850.55 $8,274
marin-us-central2 US-CENTRAL2 172.0M 757.07 $6,020
marin-us-central1 US-CENTRAL1 47.1M 834.48 $5,994
marin-eu-west4 EUROPE-WEST4 65.9M 433.70 $4,422
marin-us-west4 US-WEST4 2.6M 110.88 $916
marin-us-east1 US-EAST1 5.3M 63.02 $243

By Storage Class

Class Objects Size (TB) Monthly Cost % of Total
STANDARD 261.8M 1,193.67 $15,906 39.1%
NEARLINE 36.7M 1,253.23 $8,533 41.1%
COLDLINE 16.3M 479.55 $1,311 15.7%
ARCHIVE 6.6M 123.25 $119 4.0%

Top First-Level Directories

Bucket Directory Objects Size (TB) Monthly Cost
marin-us-east5 checkpoints/ 901.0K 380.36 $4,168
marin-us-central1 checkpoints/ 1.8M 308.56 $1,865
marin-us-central2 grug/ 223.3K 162.06 $1,573
marin-us-central2 checkpoints/ 27.4M 298.62 $1,438
marin-us-central2 datakit/ 9.9M 109.08 $1,422
marin-us-east5 grug/ 91.5K 95.65 $1,244
marin-eu-west4 datakit/ 22.4M 80.89 $1,213
marin-us-central1 tokenized/ 1.7M 186.53 $1,119
marin-us-east5 tokenized/ 1.7M 169.50 $1,082
marin-us-east5 raw/ 965.7K 120.99 $979
marin-eu-west4 raw/ 848.5K 85.95 $833
marin-us-central1 normalized/ 422.1K 74.64 $820
marin-eu-west4 normalized/ 335.0K 66.16 $809
marin-eu-west4 tokenized/ 528.1K 96.35 $750
marin-us-central1 raw/ 1.0M 102.53 $657
marin-us-central2 raw/ 399.1K 81.97 $634
marin-us-west4 raw/ 454.2K 59.57 $568
marin-us-central2 normalized/ 202.0K 38.03 $496
marin-eu-west4 checkpoints/ 35.3M 72.68 $466
marin-us-central1 data/ 324.2K 34.32 $442
marin-us-west4 tokenized/ 398.5K 44.59 $302
marin-us-central2 tokenized/ 583.1K 41.19 $289
marin-us-east5 tmp/ 90.3K 20.62 $269
marin-us-central1 grug/ 23.6K 20.75 $222
marin-us-central1 rl_testing/ 6.6K 14.05 $183
marin-us-central1 podcast_audio_top1000_60s_clips/ 30.0M 26.38 $172
marin-us-east5 julian/ 75.4K 9.26 $115
marin-us-east5 dmlab_256x256/ 114.8K 15.80 $103
marin-us-east1 raw/ 130.3K 27.73 $101
marin-eu-west4 data/ 132.6K 6.13 $92

Top Two-Level Prefixes

Bucket Prefix Objects Size (TB) Monthly Cost
marin-us-central2 checkpoints/isoflop 165.6K 117.06 $543
marin-us-central1 data/datakit 324.1K 34.30 $441
marin-eu-west4 datakit/tokenize 101.2K 27.69 $415
marin-eu-west4 datakit/dedup 120.7K 24.83 $372
marin-us-central2 datakit/tokenize 101.2K 27.69 $361
marin-us-central1 normalized/nemotron_cc_v2 154.0K 32.89 $355
marin-us-central2 datakit/dedup_dabe67c2 145.9K 25.29 $330
marin-eu-west4 normalized/nemotron_cc_v2 119.0K 25.53 $324
marin-us-central2 datakit/store_859e8be4 3.9M 22.99 $300
marin-us-east5 grug/chunked_perf 19.3K 22.51 $293
marin-us-central2 grug/moe_1e23_d5120_bs2048_ep8_ragged_48l_resume51000_clip15_20260504_014844-3cdab2 38.6K 21.70 $283
marin-us-east5 tmp/ttl=14d 57.4K 20.49 $267
marin-us-central2 datakit/store_8ac06c74 5.2M 20.37 $266
marin-eu-west4 tokenized/dolma3_pool 107.2K 26.79 $227
marin-us-east5 tokenized/nemotron_cc 126.7K 33.56 $223
marin-eu-west4 tokenized/nemotron_cc 107.8K 28.62 $217
marin-us-central2 normalized/nemotron_cc_v2 77.0K 16.44 $214
marin-us-central1 tokenized/nemotron_cc_v2 188.7K 30.79 $201
marin-us-east5 tokenized/merged 71.4K 20.09 $195
marin-us-east5 tokenized/nemotron_cc_v2 187.0K 29.33 $191
marin-us-central1 tokenized/nemotron_cc 76.0K 20.68 $189
marin-us-east5 raw/dolma3_pool-d37843 478.8K 13.98 $182
marin-us-central2 grug/moe_1e23_d5120_bs2048_ep8_ragged_48l_resume112662_clip15_20260518_123236-6e783f 20.6K 13.95 $182
marin-us-central2 grug/moe-v7-1e22-d3200-5a4518 27.6K 27.02 $176
marin-us-west4 tokenized/nemotron_cc 223.4K 25.86 $175
marin-us-central2 tokenized/nemotron_cc 48.5K 13.58 $170
marin-us-central2 grug/moe_1e23_d5120_bs2048_ep8_ragged_48l_resume86437_clip15_20260512_142310-d2a994 19.5K 12.87 $168
marin-us-central1 checkpoints/isoflop-curation 19.2K 13.39 $163
marin-us-central1 normalized/nemotron_cc_v2_1 68.5K 14.95 $163
marin-us-central2 grug/moe_1e23_d5120_bs2048_ep8_ragged_48l_resume93092_clip15_20260514_031402-5450e2 19.2K 12.40 $162

Top 3-Level Prefixes

Bucket Prefix Objects Size (TB) Monthly Cost
marin-eu-west4 datakit/dedup/dedup_v0_manual 120.7K 24.83 $372
marin-us-central2 datakit/dedup_dabe67c2/metadata 45.1K 25.00 $326
marin-us-central2 grug/moe_1e23_d5120_bs2048_ep8_ragged_48l_resume51000_clip15_20260504_014844-3cdab2/checkpoints 38.6K 21.70 $283
marin-us-east5 tmp/ttl=14d/checkpoints-temp 57.4K 20.49 $267
marin-us-east5 tokenized/merged/dolma3_dolmino_top_level 71.4K 20.09 $195
marin-us-east5 raw/dolma3_pool-d37843/data 221.0K 13.98 $182
marin-us-central2 grug/moe_1e23_d5120_bs2048_ep8_ragged_48l_resume112662_clip15_20260518_123236-6e783f/checkpoints 20.6K 13.95 $182
marin-us-central2 grug/moe-v7-1e22-d3200-5a4518/checkpoints 27.6K 27.02 $176
marin-us-central2 grug/moe_1e23_d5120_bs2048_ep8_ragged_48l_resume86437_clip15_20260512_142310-d2a994/checkpoints 19.5K 12.87 $168
marin-eu-west4 datakit/tokenize/nemotron_cc_v2 38.5K 10.96 $164
marin-us-central2 grug/moe_1e23_d5120_bs2048_ep8_ragged_48l_resume93092_clip15_20260514_031402-5450e2/checkpoints 19.2K 12.40 $162
marin-us-central1 data/datakit/tokenized 246.1K 11.67 $150
marin-eu-west4 raw/nemotro-cc-eeb783/contrib 31.3K 11.77 $150
marin-us-central2 datakit/tokenize/nemotron_cc_v2 38.5K 10.96 $143
marin-us-central1 data/datakit/normalized 27.8K 10.14 $128
marin-us-east5 checkpoints/exp_sft_qwen3_8b_selfinstill_lr1e5_rstarcoder_n8_vr5_round1-87a9ec/checkpoints 8.4K 9.83 $128
marin-eu-west4 raw/dolma3_pool-d37843/data 221.0K 13.98 $119
marin-us-east5 julian/experiments/causal-diffusion 61.6K 9.08 $113
marin-us-east5 checkpoints/e3956np_m32b_q30ba3b_50k-efb0d2/checkpoints 4.5K 7.95 $104
marin-eu-west4 normalized/nemotron_cc_v2_1/medium_high_quality_synthetic_c1918ec6 28.6K 6.24 $94
marin-eu-west4 normalized/nemotron_cc_v2/medium_quality_4b3940a2 29.7K 6.07 $91
marin-us-central1 user/rav/full_dedup_exp1 91.6K 6.84 $89
marin-us-central1 tokenized/finetranslations_parallel-929291/train 81.6K 13.33 $87
marin-us-east5 checkpoints/e3956np_m32b_kimi_50k-462e6f/checkpoints 3.7K 6.63 $86
marin-us-central1 normalized/nemotron_cc_v2_1/medium_high_quality_synthetic_c1918ec6 28.6K 6.24 $81
marin-us-central2 normalized/nemotron_cc_v2_1/medium_high_quality_synthetic_c1918ec6 28.6K 6.24 $81
marin-us-central1 normalized/nemotron_cc_v2/medium_quality_4b3940a2 29.7K 6.07 $79
marin-us-central2 normalized/nemotron_cc_v2/medium_quality_4b3940a2 29.7K 6.07 $79
marin-eu-west4 datakit/store/_smoke_v0.1_20260518_mixed 9.5M 5.16 $77
marin-us-central1 tokenized/nemotron_cc_v2_1/medium_high_quality_synthetic-237efe 89.7K 11.80 $77

Age Distribution

Age Objects Size (TB) Monthly Cost
<7d 13.4M 304.10 $3,998
7-30d 33.1M 719.79 $9,645
30-90d 42.8M 1,253.01 $9,246
90-365d 73.5M 523.40 $1,971
>365d 158.5M 249.41 $1,009

Monthly Creation Trend

Month Objects Created Size Created (TB)
2026-05 45.5M 847.13
2026-04 35.3M 690.87
2026-03 7.4M 593.53
2026-02 5.3M 305.75
2026-01 2.0M 144.53
2025-12 10.4M 56.08
2025-11 20.9M 19.83
2025-10 21.5M 20.87
2025-09 8.5M 11.63
2025-08 1.9M 45.28
2025-07 2.2M 52.40
2025-06 1.8M 11.81
2025-05 32.0M 26.73
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment