Skip to content

Instantly share code, notes, and snippets.

View alexlitz's full-sized avatar

Alexander Litzenberger alexlitz

View GitHub Profile
@alexlitz
alexlitz / tiny_adder_less_submission.py
Created February 26, 2026 18:04
Tiny Adder Less Submission
#!/usr/bin/env python3
"""
TinyAdder: 9-parameter hand-crafted transformer for 10-digit addition.
This is admittadly pushing the rules, the parameters are agressivly dedubplicated, arange is used
it is however ones these 9 unique floats are ranged into the weights doing only standard
transformer ops.
Parameter counting (explicit scalars only):
- Count scalar tensors created in __init__ (shared scalars count once).
@alexlitz
alexlitz / tiny_adder_submission_autoregressive_gen.py
Last active March 3, 2026 11:43
Tiny Adder Autoregressive
#!/usr/bin/env python3
"""
TinyAdder: 36-parameter hand-crafted transformer for 10-digit addition.
Parameter counting:
- Identity mappings (direct copy): 0 params
- Broadcast (1 value to N outputs): 1 param
- Distinct values: count each
"""
import torch
@alexlitz
alexlitz / tiny_adder_36.py
Last active February 27, 2026 17:29
tiny adder 36
#!/usr/bin/env python3
"""
TinyAdder: A 36-parameter hand-crafted transformer for 10-digit addition.
This model adds two 10-digit numbers with 100% accuracy using only 36 unique parameters.
Architecture:
- 2-layer transformer with ALiBi positional encoding
- Layer 0: 5 attention heads (only 2 active), ReGLU FFN
- Layer 1: 1 head uniform attention, V-shaped error FFN
@alexlitz
alexlitz / tiny_adder_submission.py
Created February 25, 2026 10:32
Tiny Adder Submission
#!/usr/bin/env python3
"""
TinyAdder: 36-parameter hand-crafted transformer for 10-digit addition.
Parameter counting:
- Identity mappings (direct copy): 0 params
- Broadcast (1 value to N outputs): 1 param
- Distinct values: count each
"""
import torch
@alexlitz
alexlitz / tiny_adder.py
Created February 25, 2026 10:07
Tiny Adder
#!/usr/bin/env python3
"""
TinyAdder: A hand-crafted 95-parameter transformer that performs 10-digit addition with ~100% accuracy.
Only non-zero parameters are counted, so the nominal number of parameters is higher but most are zero.
Architecture:
- 2-layer transformer with ALiBi positional encoding
- Layer 0: 5 attention heads
- Layer 1: 1 head uniform attention