Skip to content

Instantly share code, notes, and snippets.

View ryan-williams's full-sized avatar
🚆

Ryan Williams ryan-williams

🚆
View GitHub Profile
@ryan-williams
ryan-williams / DESCRIPTION.md
Created May 7, 2026 12:44
Draft PR for marin-community/marin

Title

Description of the PR...

@ryan-williams
ryan-williams / DESCRIPTION.md
Created May 7, 2026 12:19
Draft PR for google/tensorstore

Title

Description of the PR...

@ryan-williams
ryan-williams / DESCRIPTION.md
Created May 7, 2026 03:24
Draft PR for Open-Athena/tomat

Title

Description of the PR...

Phase 0 result: empirical M-per-mat distribution at M̄=64

Ran scripts/sampling_distribution_preview.py against MPDB v2's 77,427-mat train split. Per-mat M_i under each candidate weighting (mean target M̄=64):

weighting min p10 p50 p90 max max:min
uniform 64 64 64 64 64
electrons 0.14 16.8 47.0 134.4 609 4328×

[Open-Athena/tomat#2] MPDB: backfill n_atoms / n_electrons, publish to R2

Background

data/mpdb.sqlite is the materials-metadata database for MP entries we train on (built by scripts/build_mpdb.py). Current schema includes mp_id, split, grid dims nx/ny/nz, and computed virtual columns for cube_seq_pN. Train rows have n_atoms. Val and test rows have n_atoms = NULL.

@ryan-williams
ryan-williams / DESCRIPTION.md
Created May 6, 2026 00:53
Draft PR for Open-Athena/tomat

Title

Description of the PR...

[Open-Athena/tomat#1] v3 patch tokenizer: per-patch translated atoms, P=19, M=64, drop redundant preamble blocks

Background: tokenizer evolution so far

Version Dataset prefix Codec Size (train) Notes
v1 train-full two_token_9_12 (2 tokens / voxel, float) 21.1 GB Original; replaced by LMQ.
v2-prelim train-full-lmq LMQ (initial fit) 11.6 GB Replaced by Lloyd-Max log-spaced fit (dbb0312).
v2 train-full-lmq-v2 LMQ (log-spaced, vocab ~16k) 13.4 GB Workhorse for most 200M / 1B runs.
v2-vocab-32k train-full-lmq-v2-32k LMQ vocab=32k 15.8 GB Vocab sweep.
@ryan-williams
ryan-williams / DESCRIPTION.md
Created May 6, 2026 00:47
Draft PR for Open-Athena/tomat

Title

Description of the PR...

@ryan-williams
ryan-williams / DESCRIPTION.md
Created May 6, 2026 00:44
Draft PR for test-owner/test-repo

Title

Description of the PR...

@ryan-williams
ryan-williams / DESCRIPTION.md
Created May 6, 2026 00:44
Draft PR for o/r

Title

Description of the PR...