- "Not All Language Model Features Are One-Dimensionally Linear"
- [Applications and Broader Implications (Interpretability, Safety, etc.)](https://gist.github.com/ajcwebdev/545701f96
From the paper Transformer decoding in fifty lines of pseudocode by Bob Carpenter at the Flatiron Institute. I copied it by hand and it almost certainly has a mistake or two in it.
DECODE(tok: int<lower=1, upper=T>[N],
alpha: matrix(T, V),
betas: { query: matrix(V, K),
key: matrix(V, K),
value: matrix(V, V) }[A],
gammas: { 1: vector(L), 2: matrix(L, V),
3: vector(V), 4: matrix(V, L) }[A],