jboner · July 30, 2026 15:17 · invisiblethings · Nov 24, 2021 · apimaker001 · Dec 23, 2021
diff --git a/latency.txt b/latency.txt
 Latency Comparison Numbers (~2012)
 ----------------------------------
 L1 cache reference                           0.5 ns
 Branch mispredict                            5   ns
 L2 cache reference                           7   ns                      14x L1 cache
 Mutex lock/unlock                           25   ns
 Main memory reference                      100   ns                      20x L2 cache, 200x L1 cache
 Compress 1K bytes with Zippy             3,000   ns        3 us
 Send 1K bytes over 1 Gbps network       10,000   ns       10 us
 Read 4K randomly from SSD*             150,000   ns      150 us          ~1GB/sec SSD
 Read 1 MB sequentially from memory     250,000   ns      250 us
 Round trip within same datacenter      500,000   ns      500 us
 Read 1 MB sequentially from SSD*     1,000,000   ns    1,000 us    1 ms  ~1GB/sec SSD, 4X memory
 Disk seek                           10,000,000   ns   10,000 us   10 ms  20x datacenter roundtrip
 Local LLM, generate 1 token         15,000,000   ns   15,000 us   15 ms  Small model on consumer GPU (2026)
 Read 1 MB sequentially from disk    20,000,000   ns   20,000 us   20 ms  80x memory, 20X SSD
 Frontier LLM, generate 1 token      20,000,000   ns   20,000 us   20 ms  Hosted model output (2026)
 Local LLM, time to first token      75,000,000   ns   75,000 us   75 ms  Small model, short prompt (2026)
 Local LLM (CPU), generate 1 token  100,000,000   ns  100,000 us  100 ms  Small model, no GPU (2026)
 Send packet CA->Netherlands->CA    150,000,000   ns  150,000 us  150 ms
 Fast LLM, time to first token      250,000,000   ns  250,000 us  250 ms  Specialized inference hardware (2026)
 Frontier LLM, time to first token                              1,000 ms  Short prompt, no cache (2026)
 Frontier LLM, short response                                   3,000 ms  ~100 output tokens (2026)
 Frontier LLM, long context prefill                            10,000 ms  ~100K input tokens, no cache (2026)
 Frontier LLM, reasoning response                              30,000 ms  Single call with thinking (2026)

 Notes
 -----
 1 ns = 10^-9 seconds
 1 us = 10^-6 seconds = 1,000 ns
 1 ms = 10^-3 seconds = 1,000 us = 1,000,000 ns

 Credit
 ------
 By Jeff Dean:               http://research.google.com/people/jeff/
 Originally by Peter Norvig: http://norvig.com/21-days.html#answers

 Contributions
 -------------
 'Humanized' comparison:    https://gist.github.com/hellerbarde/2843375
 Visual comparison chart:   http://i.imgur.com/k0t1e.png
 Interactive Prezi version: https://prezi.com/pdkvgys-r0y6/latency-numbers-for-programmers-web-development/latency.txt
	Latency Comparison Numbers (~2012)
	----------------------------------
	L1 cache reference 0.5 ns
	Branch mispredict 5 ns
	L2 cache reference 7 ns 14x L1 cache
	Mutex lock/unlock 25 ns
	Main memory reference 100 ns 20x L2 cache, 200x L1 cache
	Compress 1K bytes with Zippy 3,000 ns 3 us
	Send 1K bytes over 1 Gbps network 10,000 ns 10 us
	Read 4K randomly from SSD* 150,000 ns 150 us ~1GB/sec SSD
	Read 1 MB sequentially from memory 250,000 ns 250 us
	Round trip within same datacenter 500,000 ns 500 us
	Read 1 MB sequentially from SSD* 1,000,000 ns 1,000 us 1 ms ~1GB/sec SSD, 4X memory
	Disk seek 10,000,000 ns 10,000 us 10 ms 20x datacenter roundtrip
	Local LLM, generate 1 token 15,000,000 ns 15,000 us 15 ms Small model on consumer GPU (2026)
	Read 1 MB sequentially from disk 20,000,000 ns 20,000 us 20 ms 80x memory, 20X SSD
	Frontier LLM, generate 1 token 20,000,000 ns 20,000 us 20 ms Hosted model output (2026)
	Local LLM, time to first token 75,000,000 ns 75,000 us 75 ms Small model, short prompt (2026)
	Local LLM (CPU), generate 1 token 100,000,000 ns 100,000 us 100 ms Small model, no GPU (2026)
	Send packet CA->Netherlands->CA 150,000,000 ns 150,000 us 150 ms
	Fast LLM, time to first token 250,000,000 ns 250,000 us 250 ms Specialized inference hardware (2026)
	Frontier LLM, time to first token 1,000 ms Short prompt, no cache (2026)
	Frontier LLM, short response 3,000 ms ~100 output tokens (2026)
	Frontier LLM, long context prefill 10,000 ms ~100K input tokens, no cache (2026)
	Frontier LLM, reasoning response 30,000 ms Single call with thinking (2026)

	Notes
	-----
	1 ns = 10^-9 seconds
	1 us = 10^-6 seconds = 1,000 ns
	1 ms = 10^-3 seconds = 1,000 us = 1,000,000 ns

	Credit
	------
	By Jeff Dean: http://research.google.com/people/jeff/
	Originally by Peter Norvig: http://norvig.com/21-days.html#answers

	Contributions
	-------------
	'Humanized' comparison: https://gist.github.com/hellerbarde/2843375
	Visual comparison chart: http://i.imgur.com/k0t1e.png
	Interactive Prezi version: https://prezi.com/pdkvgys-r0y6/latency-numbers-for-programmers-web-development/latency.txt
Operation	ns	µs	ms	note
L1 cache reference	0.5 ns
Branch mispredict	5 ns
L2 cache reference	7 ns			14x L1 cache
Mutex lock/unlock	25 ns
Main memory reference	100 ns			20x L2 cache, 200x L1 cache
Compress 1K bytes with Zippy	3,000 ns	3 µs
Send 1K bytes over 1 Gbps network	10,000 ns	10 µs
Read 4K randomly from SSD*	150,000 ns	150 µs		~1GB/sec SSD
Read 1 MB sequentially from memory	250,000 ns	250 µs
Round trip within same datacenter	500,000 ns	500 µs
Read 1 MB sequentially from SSD*	1,000,000 ns	1,000 µs	1 ms	~1GB/sec SSD, 4X memory
Disk seek	10,000,000 ns	10,000 µs	10 ms	20x datacenter roundtrip
Read 1 MB sequentially from disk	20,000,000 ns	20,000 µs	20 ms	80x memory, 20X SSD
Send packet CA -> Netherlands -> CA	150,000,000 ns	150,000 µs	150 ms
Operation	Time (ns)	Banana Units
L1 cache reference	0.5 ns	1 banana (one banana)
Branch mispredict	5 ns	10 bananas (ten bananas)
L2 cache reference	7 ns	14 bananas (fourteen bananas)
Mutex lock/unlock	25 ns	50 bananas (fifty bananas)
Main memory reference	100 ns	200 bananas (two hundred bananas)
Compress 1K bytes with Zippy	3,000 ns	6,000 bananas (six thousand bananas)
Send 1K bytes over 1 Gbps network	10,000 ns	20,000 bananas (twenty thousand bananas)
Read 4K randomly from SSD	150,000 ns	300,000 bananas (three hundred thousand bananas)
Read 1 MB sequentially from memory	250,000 ns	500,000 bananas (five hundred thousand bananas)
Round trip within same datacenter	500,000 ns	1,000,000 bananas (one million bananas)
Read 1 MB sequentially from SSD	1,000,000 ns	2,000,000 bananas (two million bananas)
Disk seek	10,000,000 ns	20,000,000 bananas (twenty million bananas)
Read 1 MB sequentially from disk	20,000,000 ns	40,000,000 bananas (forty million bananas)
Send packet CA->Netherlands->CA	150,000,000 ns	300,000,000 bananas (three hundred million bananas)
Operation	Clock Cycles	Time (ns, @4–5.5 GHz)	Notes
L2 Cache Reference	12–17 cycles	~2–4 ns	Zen 5 ≈12–14 cycles; Intel Lion Cove ≈16–17 cycles
Branch Misprediction Penalty	10–20 cycles	~3–5+ ns	Typical 15–18 cycles; can reach dozens in worst cases