Let's make disassembler, easier.
Template with simplicity, let's keep it simple.
mov $reg, $reg
li $reg, $imm
ld $reg, $mem
Each $-prefixed word is treated as template variable, and it's processed left-to-right, and added to Operand list(insn.ops).
Then where is definition of $reg
, $imm
, $mem
? Keep going!
It's python module named arch.<name>.defs
. It's like...
from disassembler.models import Immediate, Register, Memory, ints
# some helpers
byte = lambda: (lambda x: x.code[x.addr])
dword = lambda: (lambda x: ints(x, 4, 'little'))
#
# template variable
reg = Register(byte())
imm = Immediate(dword())
mem = Memory(base=Register(byte()))
#
# necessary information. Just one.
regNames = ['r0', 'r1']
#
# Below is optional (used in IDA module)
jumps = ['j', 'jz'] # Jump instructions
stops = ['int3', 'ret'] # End of a function
calls = ['call'] # Call instructions
# That's all. In convenience I put table in defs.py
table = {
0: 'mov $reg, $reg',
1: 'li $reg, $imm',
2: 'ld $reg, $mem'
}
Each handler like reg, byte is a callable.
Template parser calls corresponding handler($reg - defs.reg
) with one argument: state
.
State is a dictionary-like object(but .name instead of ['name']) of disassembler(with current address, code, etc).
Why not dictionary? Because it's shorter.
This is last part, and extensively used. It uses State()
class, which is variable with some required fields: addr
, code
, defs
.
import disassembler.arch.myvm.defs as defs
def analyze(state):
op = state.code[state.addr]
expr = defs.table[op]
# parse(state, template) -> Instuction(mnemonic='...', ops=[...])
insn = parse(state, expr)
state.addr += 1
return insn
Then add your architecture to arch/__init__.py
.
from disassembler.arch import mips, ndh, clemency, *myvm*
supported_arch = dict(mips=mips, ndh=ndh, clemency=clemency, *myvm=myvm*)
The disassembler is like this.
from disassembler.context import State
from disassembler.arch import supported_arch
class Disassembler:
def __init__(self, arch):
current_arch = supported_arch.get(arch)
self.analyze = current_arch.archoperator.analyze
self.defs = current_arch.defs
def disasm(self, code, base):
state = State(defs=self.defs, code=code) # state.defs=defs, ..
while state.addr < len(code):
prev = state.addr
try:
insn = self.analyze(state)
except:
insn = make_byte(state) # .byte 0xCC
insn.code = code[prev:state.addr]
insn.addr = prev
yield insn
For details, see disasm.py and some examples (clemency, mips, ndh, myvm). MyVM is minimal, then ndh, mips, clemency.
NDH example