- Von Neumann Model (Stored program + Sequential instruction) as opposed to dataflow
- Algorithm
- ISA
- Moore's law
- What is comp arch
- Dataflow
- ISA vs microarch
- is a specific impl of the ISA
- is not exposed to the software layer (we don't do that at this time)
e.g. pipelining NOT EXPOSED e.g. Out of order execution NOT EXPOSED e.g. memory access scheduling policy NOT EXPOSED e.g. speculative execution NOT EXPOSED (today) e.g. superscalar processing NOT EXPOSED (mostly, see sepctre & meltdown) and many more....
Opcode "+" ....................................... ISA # of gen purpose registers ....................... ISA # of ports to the register file .................. uarch # of cycles to execute the MUL instr. ............ uarch pipelining ....................................... uarch
REMEMBER: uarch is an impl of the ISA under specific design constraints and goals.
A design point is a set of design constraints and their importance
design point ==> leads to tradeoffs in both ISA and uarch.
This lecture:
- ISA-level tradeoffs
- uarch-level tradeoffs
- system and task level tradeoffs (how to divide labour between HW and SW)
MIPS, ARM, ALPHA are all ISAs.
The following is a LC-3b add instr layout:
Layout 1:
15 0
+----+---+---+-+--+---+
|0001|DR |SR1|0|00|SR2|
+----+---+---+-+--+---+
Layout 2:
15 0
+----+---+---+-+------+
|0001|DR |SR1|1| imm5 |
+----+---+---+-+------+
- 0-address machine (stack machine)
compile this to stack machine:
(7+5)x8x9 = 864
push 9
push 8
push 5
push 7
add
mul
mul
pop => 864
- 1-address machine: accumulator machine
- 2-address machine: x86 and a many others
- 3-address machine: MIPS, Lc-3b
- E.g OPCODE
- E.g operand specifiers (addressing modes)
E.g. int, float, char, binary, decimal, BCD (binary coded decimal), doubly linked list, queue, str, bit, vec, string (implicit, explicit)
endianness of data is also an aspect of the ISA
Programming language +---------------------------+ High | List / DoublyLinkedList | | struct / Queue / stack | <-- ISA ? +---------------------------+ | string / float / decimal | <-- ISA ? | bigint | +---------------------------+ | int / byte / char | <-- ISA ? | | +---------------------------+ Low Control signals
- Address space
- Addressing granularity byte addressible? 64-bit addressible? <= some supercomputers bit addressible? <= rare
- Support for virtual memory?
- How many?
- How long?
Why registers Data temporal locality => reuse of data
- arithmatic / logical
- fetch / compute / store
- implicit sequential ctrl flow
- MV data between memory and register
- PC++
- JMP
- L/S: operate only on registers, must load/store to interact with memory.
- M2M: can operate directly on mem, can also load/store.
L/S: MIPS, ARM, other RISCs. M2M: x86, VAX, other CISCs.
- Absolute: use immediate value (
LW rt 10000
) - Register indirect: reg as pointer (
LW rt, r
) - Displacement: reg as pointer + offset (
LW rt, r[offset]
) - Indexed:
LW rt, r, index
, wherer
andindex
gen purpose - Mem indirect:
reg -> mem[ptr] -> mem[data]
- Auto inc/dec
Why more mem addr modes? This is programmer-uarch tradeoff. pro:
- better mapping of high-level instr' to machine code
- reduced # of instr' and code size (thus less mem bus band requirement)
e.g. auto increment is good for memory traverse e.g. double indirect is good for ** and linked lists etc. e.g. sparce matrix access
- better support for complex data structure.
con:
- compiler needs more reasoning to pick the right addr mode
- uarch more impl pain
An orthogonol ISA allows all addressing modes to be used on all instr. types.
e.g. VAX: ~13 addr modes >300 opcodes 2 formats (int/float) =780 actual addressing impls for uarch
pro:
- flexible
- easy to write asm
- compiler can pick whatever it likes
con:
- uarch hard to impl
- Interface with IO devices
- mem mapped IO
- special IO instructions (
IN
,OUT
in x86) Tradeoffs?
- Privilege modes
- user vs superuser
- who can exe what instr.
- Exception & Interrupt handling
vectored vs. non-vectored interrupts vectored = knows who interrupted non-vectored = only knows it's interrupted
- Virtual Memory
- Access Protection (Segfault?)
and more....
+---------------------------+ HLL | | Compiler | | V | +---------------------------+--- CISC ISA | | uarch | | | | | | | | V | +---------------------------+ Control Signals +---------------------------+ HLL | | Compiler | | | | | | | | V | +---------------------------+--- RISC ISA | | uarch | | V | +---------------------------+ Control Signals
CISC: VAX INDEX instr. can index 5D array with bounds check with one instr.
- Compiler simplicity: CISC wins1
- Hardware simplicity: RISC wins
- Less burden of backwards compatibility: RISC wins
- Fixed
- Variable
- Uniform
- Non-uniform
Usually:
Risc
Simple instr Fixed length Uniform decode Few addr modes
Cisc
Complex instr Variable length Non-uniform decode Many addr modes
Footnotes
-
Compiler has more options to choose from to perform the same job. So implementing a correct compiler is easier. But the compiler has to weigh all the choices to see which one best fits the program, so having a optimal compiler is not necessarily easier. ↩