Alexander Litzenberger alexlitz

alexlitz / tiny_adder_less_submission.py

Created February 26, 2026 18:04

Tiny Adder Less Submission

	#!/usr/bin/env python3
	"""
	TinyAdder: 9-parameter hand-crafted transformer for 10-digit addition.

	This is admittadly pushing the rules, the parameters are agressivly dedubplicated, arange is used
	it is however ones these 9 unique floats are ranged into the weights doing only standard
	transformer ops.

	Parameter counting (explicit scalars only):
	- Count scalar tensors created in __init__ (shared scalars count once).

alexlitz / tiny_adder_submission_autoregressive_gen.py

Last active March 12, 2026 22:30

Tiny Adder Autoregressive

	#!/usr/bin/env python3
	"""
	TinyAdder: 36-parameter hand-crafted transformer for 10-digit addition.

	Parameter counting:
	- Identity mappings (direct copy): 0 params
	- Broadcast (1 value to N outputs): 1 param
	- Distinct values: count each
	"""
	import torch

alexlitz / tiny_adder_36.py

Last active February 27, 2026 17:29

tiny adder 36

	#!/usr/bin/env python3
	"""
	TinyAdder: A 36-parameter hand-crafted transformer for 10-digit addition.

	This model adds two 10-digit numbers with 100% accuracy using only 36 unique parameters.

	Architecture:
	- 2-layer transformer with ALiBi positional encoding
	- Layer 0: 5 attention heads (only 2 active), ReGLU FFN
	- Layer 1: 1 head uniform attention, V-shaped error FFN

alexlitz / tiny_adder_submission.py

Created February 25, 2026 10:32

Tiny Adder Submission

	#!/usr/bin/env python3
	"""
	TinyAdder: 36-parameter hand-crafted transformer for 10-digit addition.

	Parameter counting:
	- Identity mappings (direct copy): 0 params
	- Broadcast (1 value to N outputs): 1 param
	- Distinct values: count each
	"""
	import torch

alexlitz / tiny_adder.py

Created February 25, 2026 10:07

Tiny Adder

	#!/usr/bin/env python3
	"""
	TinyAdder: A hand-crafted 95-parameter transformer that performs 10-digit addition with ~100% accuracy.

	Only non-zero parameters are counted, so the nominal number of parameters is higher but most are zero.

	Architecture:
	- 2-layer transformer with ALiBi positional encoding
	- Layer 0: 5 attention heads
	- Layer 1: 1 head uniform attention