-
-
Save jboner/2841832 to your computer and use it in GitHub Desktop.
Latency Comparison Numbers (~2012) | |
---------------------------------- | |
L1 cache reference 0.5 ns | |
Branch mispredict 5 ns | |
L2 cache reference 7 ns 14x L1 cache | |
Mutex lock/unlock 25 ns | |
Main memory reference 100 ns 20x L2 cache, 200x L1 cache | |
Compress 1K bytes with Zippy 3,000 ns 3 us | |
Send 1K bytes over 1 Gbps network 10,000 ns 10 us | |
Read 4K randomly from SSD* 150,000 ns 150 us ~1GB/sec SSD | |
Read 1 MB sequentially from memory 250,000 ns 250 us | |
Round trip within same datacenter 500,000 ns 500 us | |
Read 1 MB sequentially from SSD* 1,000,000 ns 1,000 us 1 ms ~1GB/sec SSD, 4X memory | |
Disk seek 10,000,000 ns 10,000 us 10 ms 20x datacenter roundtrip | |
Read 1 MB sequentially from disk 20,000,000 ns 20,000 us 20 ms 80x memory, 20X SSD | |
Send packet CA->Netherlands->CA 150,000,000 ns 150,000 us 150 ms | |
Notes | |
----- | |
1 ns = 10^-9 seconds | |
1 us = 10^-6 seconds = 1,000 ns | |
1 ms = 10^-3 seconds = 1,000 us = 1,000,000 ns | |
Credit | |
------ | |
By Jeff Dean: http://research.google.com/people/jeff/ | |
Originally by Peter Norvig: http://norvig.com/21-days.html#answers | |
Contributions | |
------------- | |
'Humanized' comparison: https://gist.github.com/hellerbarde/2843375 | |
Visual comparison chart: http://i.imgur.com/k0t1e.png | |
Interactive Prezi version: https://prezi.com/pdkvgys-r0y6/latency-numbers-for-programmers-web-development/latency.txt |
Just a note for whomever wants to use this as a reference: I personally understand this does not take queuing & contention into account.
Numbers should change when physical devices have contention (CPU, Memory buffers, NIO, Thread pools) so things might be slightly larger in the average case (usually you'd try to maximize utilization and tradeoff for some contention) and quite larger on worst case (when that optimization goes wrong or you botched the design with bad bottlenecks).
This is amazing work btw and I'm glad to see how the community has added specs, references and notes on top of it.
Thanks for the comments and suggestions. This is not my original work; it's a community effort.
One more thing I gotta memorize 😔
Let's use 🍌 for the scale 👉
Operation | Time (ns) | Banana Units |
---|---|---|
L1 cache reference | 0.5 ns | 1 banana (one banana) |
Branch mispredict | 5 ns | 10 bananas (ten bananas) |
L2 cache reference | 7 ns | 14 bananas (fourteen bananas) |
Mutex lock/unlock | 25 ns | 50 bananas (fifty bananas) |
Main memory reference | 100 ns | 200 bananas (two hundred bananas) |
Compress 1K bytes with Zippy | 3,000 ns | 6,000 bananas (six thousand bananas) |
Send 1K bytes over 1 Gbps network | 10,000 ns | 20,000 bananas (twenty thousand bananas) |
Read 4K randomly from SSD | 150,000 ns | 300,000 bananas (three hundred thousand bananas) |
Read 1 MB sequentially from memory | 250,000 ns | 500,000 bananas (five hundred thousand bananas) |
Round trip within same datacenter | 500,000 ns | 1,000,000 bananas (one million bananas) |
Read 1 MB sequentially from SSD | 1,000,000 ns | 2,000,000 bananas (two million bananas) |
Disk seek | 10,000,000 ns | 20,000,000 bananas (twenty million bananas) |
Read 1 MB sequentially from disk | 20,000,000 ns | 40,000,000 bananas (forty million bananas) |
Send packet CA->Netherlands->CA | 150,000,000 ns | 300,000,000 bananas (three hundred million bananas) |
In this table, each operation's latency is expressed in terms of the smallest unit—a single L1 cache reference, which is equivalent to 1 banana.
while I find the idea of a banana as a base unit of distance, it's not really helpful here. however, you could do a scale of distances, starting at the planck length in femto bananas or something.
As an updated point of reference for the first few numbers, Apple give a table in their Apple Silicon CPU Optimization guide. You can see they are extremely similar to the original figures: