Created
May 7, 2013 17:18
-
-
Save anonymous/5534375 to your computer and use it in GitHub Desktop.
Instruction dispatch variants
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Okay, here's the different op splitting/fusing strategies for different cores, as far as I've been able to discern them: | |
Pentium: Complex instructions are U-pipe only but execute directly, they don't get split. | |
Atom (Bonnel/Saltwell): Certain complex instructions don't get split. | |
Pentium Pro/2/3: All ops get split into, tracked as, and executed as uOps. | |
Pentium 4: This never happened. | |
Pentium M/Core: | |
All ops get split into uOps. | |
Post-split, the core can fuse two types of multi-uOp sequences into a larger fused op used for tracking: | |
- For stores, address generation + actual store uOps can get fused. | |
- Read-modify (but not read-modify-write) fusion, aka "load-op" fusion. | |
This is what Intel calls "micro-op fusion". The fused uOps are what's used in the scheduler/ROB etc. | |
The complex ops are split into uOps for the purposes of execution, but the "accounting" in the core | |
is all in terms of fused ops. | |
Core2 and later: | |
Like Core, plus "macro-op fusion": Certain arithmetic and branch instructions can get fused | |
into a single arithmetic-then-branch instruction. | |
Original Athlon (K7): | |
Ops get decoded into "macro-ops" not uOps. Macro-ops can contain references to >2 source registers and memory references. | |
Any instruction that generates more than one macro-op is microcoded. | |
These are then used for scheduling and dependency tracking. These complex ops are split into uOps right before | |
execution, just like in the Pentium M. | |
The difference to Pentium M is that AMD decodes directly into macro-ops while Intel first decodes to uOps then fuses. | |
Both Intel and AMD can thus treat the instruction "add eax, [mem]" as a single op for the purpose of scheduling, but | |
they arrive there in different ways. | |
Athlon64 (K8) and later: | |
Some instructions can now generate two macro-ops without going through the microcode path. Anything above 2 macro-ops | |
is still microcoded. The rest is fairly similar. | |
Atom (Silvermont): | |
From a decode/dispatch standpoint, this seems very similar to what the K7 did as far as I can tell. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment