Standard transformer attention computes similarity between queries and keys as a dot product over high-dimensional vectors, then normalizes with softmax to produce attention weights over values. This work proposes replacing the dot-product similarity with a Gaussian kernel over scalar (one-dimensional) projections of queries and keys:
Attention(Q, K, V) = softmax(-(Q - K^T)^2 / τ) @ V