Motivations

simplify cycle counting so it's easier to understand
make the time it takes to emulate 100kcycles more regular

Theory

Memory access has inconsistent costs currently. When decoding instructions it costs cycles, but things like [A] don't. For both a "real" DCPU and the emulator (assuming typical cache behavior), memory accesses are more expensive than register accesses.

Rules

memory accesses cost 1 cycle
some operations take extra cycles:
- +1: MUL, MLI, STI, STD, IF*
- +2: DIV, DVI, MOD, MDI
- ... (HW*, interrupts)

Examples

SET A, B
- 1 cycle -- read opcode
SET [A], [B]
- 3 cycles -- read opcode, read a, write b
MUL A, 2
- 2 cycles -- read opcode, extra execute
MUL A, 200
- 3 cycles -- read opcode, read a, extra execute
MUL [A], 200
- 5 cycles -- read opcode, read a, read b, extra execute, write b

Extras

ADD/SUB should preferably be 1 cycle -- it's no more expensive than shifting.
Instructions generally cost +1 cycle if there's any branching or extra complexity in the emulator executing them.

Downsides

Makes the DCPU16 more realistic
Discourages weird optimizations based on DCPU weirdness

rmmh/DCPU16_cycle_count.md