Skip to content

Instantly share code, notes, and snippets.

@Helw150
Created May 23, 2026 00:30
Show Gist options
  • Select an option

  • Save Helw150/456bb86d2f6a7f289d56621eb9b1b41c to your computer and use it in GitHub Desktop.

Select an option

Save Helw150/456bb86d2f6a7f289d56621eb9b1b41c to your computer and use it in GitHub Desktop.
Marin storage report — 2026-05-23 00:30 UTC

GCS Storage Report

Generated: 2026-05-23T00:30:00Z

Overview

Metric Value
Total Objects 338.6M
Total Size 3,110.95 TB
Est. Monthly Cost $26,590
Annual Estimate $319,083

By Bucket

Bucket Region Objects Size (TB) Monthly Cost
marin-us-east5 US-EAST5 28.1M 895.47 $8,860
marin-us-central1 US-CENTRAL1 46.3M 846.47 $6,248
marin-us-central2 US-CENTRAL2 187.2M 807.47 $6,190
marin-eu-west4 EUROPE-WEST4 69.2M 388.81 $4,144
marin-us-west4 US-WEST4 2.5M 110.82 $920
marin-us-east1 US-EAST1 5.3M 61.90 $230

By Storage Class

Class Objects Size (TB) Monthly Cost % of Total
STANDARD 276.5M 1,217.52 $16,228 39.1%
NEARLINE 36.7M 1,352.08 $9,099 43.5%
COLDLINE 15.5M 420.20 $1,151 13.5%
ARCHIVE 9.9M 121.16 $112 3.9%

Top First-Level Directories

Bucket Directory Objects Size (TB) Monthly Cost
marin-us-east5 checkpoints/ 932.2K 392.95 $4,506
marin-us-central2 grug/ 406.7K 278.38 $2,359
marin-us-central1 checkpoints/ 1.8M 307.64 $1,870
marin-us-central2 checkpoints/ 27.4M 298.61 $1,646
marin-us-east5 grug/ 134.8K 136.99 $1,604
marin-us-central1 tokenized/ 1.7M 186.30 $1,157
marin-eu-west4 datakit/ 31.4M 76.56 $1,148
marin-us-east5 tokenized/ 1.7M 169.80 $1,089
marin-us-east5 raw/ 965.4K 120.99 $979
marin-us-central1 normalized/ 422.1K 74.64 $973
marin-eu-west4 normalized/ 334.1K 66.00 $905
marin-eu-west4 raw/ 846.1K 85.92 $864
marin-eu-west4 tokenized/ 519.0K 95.66 $746
marin-us-central2 raw/ 395.4K 81.97 $743
marin-us-central1 raw/ 1.0M 102.52 $661
marin-us-west4 raw/ 454.2K 59.57 $568
marin-us-central2 normalized/ 195.8K 37.22 $485
marin-us-central2 datakit/ 412.5K 36.88 $481
marin-us-central1 grug/ 50.8K 49.69 $473
marin-us-central1 data/ 324.2K 34.32 $447
marin-us-west4 tokenized/ 397.9K 44.56 $306
marin-us-central2 tokenized/ 583.1K 41.19 $289
marin-eu-west4 checkpoints/ 30.3M 43.26 $278
marin-us-east5 tmp/ 71.3K 13.32 $174
marin-us-central1 podcast_audio_top1000_60s_clips/ 30.0M 26.38 $172
marin-us-east5 julian/ 74.8K 9.19 $114
marin-us-east5 dmlab_256x256/ 114.8K 15.80 $103
marin-us-east1 raw/ 130.3K 27.73 $101
marin-us-central1 finetranslations-0d9d50/ 14.3K 13.95 $91
marin-us-east5 finetranslations-0d9d50/ 14.3K 13.95 $91

Top Two-Level Prefixes

Bucket Prefix Objects Size (TB) Monthly Cost
marin-us-central2 checkpoints/isoflop 165.6K 117.06 $715
marin-us-central1 data/datakit 324.1K 34.30 $447
marin-us-central1 normalized/nemotron_cc_v2 154.0K 32.89 $429
marin-eu-west4 datakit/tokenize 99.5K 27.26 $409
marin-eu-west4 normalized/nemotron_cc_v2 119.0K 25.53 $383
marin-eu-west4 datakit/dedup 120.7K 24.83 $372
marin-us-central2 datakit/tokenize 94.8K 26.67 $348
marin-us-east5 grug/chunked_perf 19.3K 22.51 $293
marin-us-central2 grug/moe_1e23_d5120_bs2048_ep8_ragged_48l_resume51000_clip15_20260504_014844-3cdab2 38.6K 21.70 $283
marin-us-east5 checkpoints/exp5611_sft_qwen3_1_7b_swe_zero_1m_8192tokens_v5p32-05d8dc 29.4K 18.66 $243
marin-us-east5 tokenized/nemotron_cc 126.7K 33.56 $223
marin-eu-west4 tokenized/dolma3_pool 105.0K 26.23 $223
marin-eu-west4 tokenized/nemotron_cc 107.8K 28.62 $218
marin-us-central2 normalized/nemotron_cc_v2 77.0K 16.44 $214
marin-us-east5 checkpoints/exp5611_sft_qwen3_1_7b_swe_zero_1m_8192tokens_arch32k_v5p32-a26bea 25.2K 15.88 $207
marin-us-central1 tokenized/nemotron_cc_v2 188.7K 30.79 $201
marin-eu-west4 normalized/nemotron_v1 87.7K 18.51 $196
marin-us-central1 normalized/nemotron_cc_v2_1 68.5K 14.95 $195
marin-us-east5 tokenized/merged 71.4K 20.09 $195
marin-us-east5 tokenized/nemotron_cc_v2 187.0K 29.33 $191
marin-us-central2 grug/moe-v7-1e22-d3200-5a4518 27.6K 27.02 $190
marin-us-central1 tokenized/nemotron_cc 76.0K 20.68 $189
marin-us-east5 raw/dolma3_pool-d37843 478.8K 13.98 $182
marin-us-central2 grug/moe_1e23_d5120_bs2048_ep8_ragged_48l_resume112662_clip15_20260518_123236-6e783f 20.6K 13.95 $182
marin-eu-west4 raw/nemotro-cc-eeb783 62.6K 11.77 $177
marin-us-west4 tokenized/nemotron_cc 223.4K 25.86 $175
marin-us-east5 tmp/ttl=14d 25.9K 13.31 $173
marin-us-central2 tokenized/nemotron_cc 48.5K 13.58 $170
marin-us-central2 grug/moe_1e23_d5120_bs2048_ep8_ragged_48l_resume86437_clip15_20260512_142310-d2a994 19.5K 12.87 $168
marin-us-central2 grug/moe_1e23_d5120_bs2048_ep8_ragged_48l_resume93092_clip15_20260514_031402-5450e2 19.2K 12.40 $162

Top 3-Level Prefixes

Bucket Prefix Objects Size (TB) Monthly Cost
marin-eu-west4 datakit/dedup/dedup_v0_manual 120.7K 24.83 $372
marin-us-central2 grug/moe_1e23_d5120_bs2048_ep8_ragged_48l_resume51000_clip15_20260504_014844-3cdab2/checkpoints 38.6K 21.70 $283
marin-us-east5 checkpoints/exp5611_sft_qwen3_1_7b_swe_zero_1m_8192tokens_v5p32-05d8dc/checkpoints 29.4K 18.63 $243
marin-us-east5 checkpoints/exp5611_sft_qwen3_1_7b_swe_zero_1m_8192tokens_arch32k_v5p32-a26bea/checkpoints 25.1K 15.85 $207
marin-us-east5 tokenized/merged/dolma3_dolmino_top_level 71.4K 20.09 $195
marin-us-central2 grug/moe-v7-1e22-d3200-5a4518/checkpoints 27.6K 27.02 $190
marin-us-east5 raw/dolma3_pool-d37843/data 221.0K 13.98 $182
marin-us-central2 grug/moe_1e23_d5120_bs2048_ep8_ragged_48l_resume112662_clip15_20260518_123236-6e783f/checkpoints 20.6K 13.95 $182
marin-eu-west4 raw/nemotro-cc-eeb783/contrib 31.3K 11.77 $177
marin-us-east5 tmp/ttl=14d/checkpoints-temp 25.9K 13.31 $173
marin-us-central2 grug/moe_1e23_d5120_bs2048_ep8_ragged_48l_resume86437_clip15_20260512_142310-d2a994/checkpoints 19.5K 12.87 $168
marin-eu-west4 datakit/tokenize/nemotron_cc_v2 38.5K 10.96 $164
marin-us-central2 grug/moe_1e23_d5120_bs2048_ep8_ragged_48l_resume93092_clip15_20260514_031402-5450e2/checkpoints 19.2K 12.40 $162
marin-us-central1 data/datakit/tokenized 246.1K 11.67 $152
marin-us-central2 datakit/tokenize/nemotron_cc_v2 38.5K 10.96 $143
marin-us-central2 raw/nemotro-cc-eeb783/contrib 31.3K 11.77 $137
marin-us-central1 data/datakit/normalized 27.8K 10.14 $132
marin-eu-west4 raw/dolma3_pool-d37843/data 221.0K 13.98 $119
marin-us-east5 julian/experiments/causal-diffusion 60.9K 9.02 $113
marin-us-east5 checkpoints/e3956np_m32b_q30ba3b_50k-efb0d2/checkpoints 4.5K 7.95 $104
marin-us-central2 raw/dclm/a3b142c 27.8K 7.20 $94
marin-eu-west4 normalized/nemotron_cc_v2_1/medium_high_quality_synthetic_c1918ec6 28.6K 6.24 $94
marin-eu-west4 normalized/nemotron_cc_v2/medium_quality_4b3940a2 29.7K 6.07 $91
marin-us-central2 grug/moe_1e23_d5120_bs2048_ep8_ragged_48l_fix_a2a_20260417_0945-d68919/checkpoints 12.6K 7.75 $91
marin-us-central1 user/rav/full_dedup_exp1 91.6K 6.84 $89
marin-us-central1 tokenized/finetranslations_parallel-929291/train 81.6K 13.33 $87
marin-us-east5 checkpoints/e3956np_m32b_kimi_50k-462e6f/checkpoints 3.7K 6.63 $86
marin-eu-west4 normalized/nemotron_v1/medium_725f15b4 27.7K 5.75 $86
marin-eu-west4 datakit/store/_smoke_v0.1_20260518_mixed 14.1M 5.72 $86
marin-us-central1 normalized/nemotron_cc_v2_1/medium_high_quality_synthetic_c1918ec6 28.6K 6.24 $81

Age Distribution

Age Objects Size (TB) Monthly Cost
<7d 36.3M 297.80 $3,966
7-30d 8.1M 728.48 $9,679
30-90d 42.7M 1,368.49 $9,993
90-365d 69.4M 468.52 $1,838
>365d 182.1M 247.66 $1,114

Monthly Creation Trend

Month Objects Created Size Created (TB)
2026-05 42.8M 759.18
2026-04 35.5M 818.74
2026-03 7.4M 618.91
2026-02 5.3M 305.05
2026-01 1.9M 143.01
2025-12 9.9M 55.29
2025-11 17.5M 18.84
2025-10 21.4M 20.82
2025-09 8.5M 11.63
2025-08 1.9M 45.27
2025-07 2.2M 52.40
2025-06 1.9M 11.96
2025-05 28.9M 26.86
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment