May 7, 2013 17:18
diff --git a/gistfile1.txt b/gistfile1.txt
 Okay, here's the different op splitting/fusing strategies for different cores, as far as I've been able to discern them:

 Pentium: Complex instructions are U-pipe only but execute directly, they don't get split.
 Atom (Bonnel/Saltwell): Certain complex instructions don't get split.
 Pentium Pro/2/3: All ops get split into, tracked as, and executed as uOps.
 Pentium 4: This never happened.
 Pentium M/Core:
  All ops get split into uOps.
  Post-split, the core can fuse two types of multi-uOp sequences into a larger fused op used for tracking:
  - For stores, address generation + actual store uOps can get fused.
  - Read-modify (but not read-modify-write) fusion, aka "load-op" fusion.
  This is what Intel calls "micro-op fusion". The fused uOps are what's used in the scheduler/ROB etc.
  The complex ops are split into uOps for the purposes of execution, but the "accounting" in the core
  is all in terms of fused ops.
 Core2 and later:
  Like Core, plus "macro-op fusion": Certain arithmetic and branch instructions can get fused
  into a single arithmetic-then-branch instruction.

 Original Athlon (K7):
  Ops get decoded into "macro-ops" not uOps. Macro-ops can contain references to >2 source registers and memory references.
  Any instruction that generates more than one macro-op is microcoded.
  These are then used for scheduling and dependency tracking. These complex ops are split into uOps right before
  execution, just like in the Pentium M.
  The difference to Pentium M is that AMD decodes directly into macro-ops while Intel first decodes to uOps then fuses.
  Both Intel and AMD can thus treat the instruction "add eax, [mem]" as a single op for the purpose of scheduling, but
  they arrive there in different ways.

 Athlon64 (K8) and later:
  Some instructions can now generate two macro-ops without going through the microcode path. Anything above 2 macro-ops
  is still microcoded. The rest is fairly similar.

 Atom (Silvermont):
  From a decode/dispatch standpoint, this seems very similar to what the K7 did as far as I can tell.
	Okay, here's the different op splitting/fusing strategies for different cores, as far as I've been able to discern them:

	Pentium: Complex instructions are U-pipe only but execute directly, they don't get split.
	Atom (Bonnel/Saltwell): Certain complex instructions don't get split.
	Pentium Pro/2/3: All ops get split into, tracked as, and executed as uOps.
	Pentium 4: This never happened.
	Pentium M/Core:
	All ops get split into uOps.
	Post-split, the core can fuse two types of multi-uOp sequences into a larger fused op used for tracking:
	- For stores, address generation + actual store uOps can get fused.
	- Read-modify (but not read-modify-write) fusion, aka "load-op" fusion.
	This is what Intel calls "micro-op fusion". The fused uOps are what's used in the scheduler/ROB etc.
	The complex ops are split into uOps for the purposes of execution, but the "accounting" in the core
	is all in terms of fused ops.
	Core2 and later:
	Like Core, plus "macro-op fusion": Certain arithmetic and branch instructions can get fused
	into a single arithmetic-then-branch instruction.

	Original Athlon (K7):
	Ops get decoded into "macro-ops" not uOps. Macro-ops can contain references to >2 source registers and memory references.
	Any instruction that generates more than one macro-op is microcoded.
	These are then used for scheduling and dependency tracking. These complex ops are split into uOps right before
	execution, just like in the Pentium M.
	The difference to Pentium M is that AMD decodes directly into macro-ops while Intel first decodes to uOps then fuses.
	Both Intel and AMD can thus treat the instruction "add eax, [mem]" as a single op for the purpose of scheduling, but
	they arrive there in different ways.

	Athlon64 (K8) and later:
	Some instructions can now generate two macro-ops without going through the microcode path. Anything above 2 macro-ops
	is still microcoded. The rest is fairly similar.

	Atom (Silvermont):
	From a decode/dispatch standpoint, this seems very similar to what the K7 did as far as I can tell.