Let's make disassembler, easier.
Template with simplicity, let's keep it simple.
mov $reg, $reg
li $reg, $imm
ld $reg, $mem
Each $-prefixed word is treated as template variable, and it's processed left-to-right, and added to Operand list(insn.ops).
Then where is definition of $reg
, $imm
, $mem
? Keep going!
It's python module named arch.<name>.defs
. It's like...
from disassembler.models import Immediate, Register, Memory, ints
# some helpers
byte = lambda: (lambda x: x.code[x.addr])
dword = lambda: (lambda x: ints(x, 4, 'little'))
#
# template variable
reg = Register(byte())
imm = Immediate(dword())
mem = Memory(base=Register(byte()))
#
# necessary information. Just one.
regNames = ['r0', 'r1']
#
# Below is optional (used in IDA module)
jumps = ['j', 'jz'] # Jump instructions
stops = ['int3', 'ret'] # End of a function
calls = ['call'] # Call instructions
# That's all. In convenience I put table in defs.py
table = {
0: 'mov $reg, $reg',
1: 'li $reg, $imm',
2: 'ld $reg, $mem'
}
Each handler like reg, byte is a callable.
Template parser calls corresponding handler($reg - defs.reg
) with one argument: state
.
State is a dictionary-like object(but .name instead of ['name']) of disassembler(with current address, code, etc).
Why not dictionary? Because it's shorter.
This is last part, and extensively used. It uses State()
class, which is variable with some required fields: addr
, code
, defs
.
import disassembler.arch.myvm.defs as defs
def analyze(state):
op = state.code[state.addr]
expr = defs.table[op]
# parse(state, template) -> Instuction(mnemonic='...', ops=[...])
insn = parse(state, expr)
state.addr += 1
return insn
Then add your architecture to arch/__init__.py
.
from disassembler.arch import mips, ndh, clemency, *myvm*
supported_arch = dict(mips=mips, ndh=ndh, clemency=clemency, *myvm=myvm*)
The disassembler is like this.
from disassembler.context import State
from disassembler.arch import supported_arch
class Disassembler:
def __init__(self, arch):
current_arch = supported_arch.get(arch)
self.analyze = current_arch.archoperator.analyze
self.defs = current_arch.defs
def disasm(self, code, base):
state = State(defs=self.defs, code=code) # state.defs=defs, ..
while state.addr < len(code):
prev = state.addr
try:
insn = self.analyze(state)
except:
insn = make_byte(state) # .byte 0xCC
insn.code = code[prev:state.addr]
insn.addr = prev
yield insn
For details, see disasm.py and some examples (clemency, mips, ndh, myvm). MyVM is minimal, then ndh, mips, clemency.
IDA processor module is independently developed with compatibility to the disassembler.