- A
secret
byte you want to read is stored at inaccessible memory locationpriv_mem
. - The sender triggers an access exception by attempting to read
priv_mem
. - Due to CPU optimization (out-of-order execution), the load of
secret
frompriv_mem
and the use of its value in (4) and (5) below may execute before the exception is triggered. - Calculate an
offset
into a known arrayprobe
by multiplyingsecret
by the width of a cache line (or whatever block size the CPU typically fetches, like a 4096-byte page). This guarantees each of those 256 possible offsets will cache separately. - Load
probe[offset]
, which causes the CPU to cache exactly one chunk of of our array, populating one cache line. - The exception finally triggers, clearing the modified registers...but cached data is not excised.
- Iterate over all 256 offsets into
probe
to find out which one loads fast. You've determined the value ofsecret
.
- The
probe
array is flushed from cache before this process, so only thesecret
-based offset gets cached. - The access exception triggers a memory fault, terminating the application, so it is performed in another process (i.e. a fork).
- This could possibly be mitigated in microcode translation by forcing an ordering guarantee on the access check and the subsequent memory loads, but that would cause all memory accesses (including legal ones) to wait for the access check.
- The kernel-level fix isolates the kernel's memory pages, so that all accesses are checked to see whether they come from a privileged process. This is where the performance impact comes from.
- AMD and ARM are not affected by this exploit because they do not allow privileged memory reads to be executed before the access check. Hence, there are no cache effects to observe in step 7.
- AMD (and some ARM, apparently) are vulnerable to the related Spectre attack, which utilizes processor branch prediction to trick the CPU into executing instructions it would not normally, but the vectors to transmit the results are more difficult to achieve.
Thanks - this is really clear and concise. A question or two, since this is all really a little out of my league.
probe
array get flushed? At 5.1?