👁️

Gashon Hussein gashon

👁️

29 followers · 7 following

127.0.0.1/8
ghussein.org

View GitHub Profile

Recently created

Least recently created

Recently updated

Least recently updated

1 file
0 forks
0 comments
0 stars

gashon / long-transformers.md

Last active December 19, 2024 08:57

WIP: Long Context Transformers notes

Transformers Long Context

The computation of QK^T involves N^2 dot products, where N is the sequence length.
The softmax(QK^T)V operation requires an N x N attention matrix, making complexity quadratic in N.

Selecting a Subsection of Tokens

Repeated theme: use lighter "augmentation" of attention (e.g. clustering, lsh lookup, etc) to vet only highly relevant tokens for expensive attention.

Gashon Hussein gashon

Transformers Long Context

Selecting a Subsection of Tokens

Reformer