Skip to content

Instantly share code, notes, and snippets.

@Jinmo
Created August 7, 2017 01:02
Show Gist options
  • Save Jinmo/23afd73a2b076866b8312bd7107e93d6 to your computer and use it in GitHub Desktop.
Save Jinmo/23afd73a2b076866b8312bd7107e93d6 to your computer and use it in GitHub Desktop.
Working on it

Disassembler

Let's make disassembler, easier.

1. Template parser

Template with simplicity, let's keep it simple.

mov $reg, $reg
li $reg, $imm
ld $reg, $mem

Each $-prefixed word is treated as template variable, and it's processed left-to-right, and added to Operand list(insn.ops).

Then where is definition of $reg, $imm, $mem? Keep going!

2. Definitions

It's python module named arch.<name>.defs. It's like...

from disassembler.models import Immediate, Register, Memory, ints
# some helpers
byte = lambda: (lambda x: x.code[x.addr])
dword = lambda: (lambda x: ints(x, 4, 'little'))
#
# template variable
reg = Register(byte())
imm = Immediate(dword())
mem = Memory(base=Register(byte()))
#
# necessary information. Just one.
regNames = ['r0', 'r1']
#
# Below is optional (used in IDA module)
jumps = ['j', 'jz'] # Jump instructions
stops = ['int3', 'ret'] # End of a function
calls = ['call'] # Call instructions
# That's all. In convenience I put table in defs.py
table = {
    0: 'mov $reg, $reg',
    1: 'li $reg, $imm',
    2: 'ld $reg, $mem'
}

What the heck is reg, and byte?

Each handler like reg, byte is a callable. Template parser calls corresponding handler($reg - defs.reg) with one argument: state. State is a dictionary-like object(but .name instead of ['name']) of disassembler(with current address, code, etc).

Why not dictionary? Because it's shorter.

3. Reader

This is last part, and extensively used. It uses State() class, which is variable with some required fields: addr, code, defs.

import disassembler.arch.myvm.defs as defs

def analyze(state):
    op = state.code[state.addr]
    expr = defs.table[op]
    # parse(state, template) -> Instuction(mnemonic='...', ops=[...])
    insn = parse(state, expr)
    state.addr += 1
    return insn

Then add your architecture to arch/__init__.py.

from disassembler.arch import mips, ndh, clemency, *myvm*

supported_arch = dict(mips=mips, ndh=ndh, clemency=clemency, *myvm=myvm*)

All in one

The disassembler is like this.

from disassembler.context import State
from disassembler.arch import supported_arch

class Disassembler:
    def __init__(self, arch):
        current_arch = supported_arch.get(arch)
        self.analyze = current_arch.archoperator.analyze
        self.defs = current_arch.defs
    
    def disasm(self, code, base):
        state = State(defs=self.defs, code=code) # state.defs=defs, ..
        while state.addr < len(code):
            prev = state.addr
            try:
                insn = self.analyze(state)
            except:
                insn = make_byte(state) # .byte 0xCC
            insn.code = code[prev:state.addr]
            insn.addr = prev
            yield insn

For details, see disasm.py and some examples (clemency, mips, ndh, myvm). MyVM is minimal, then ndh, mips, clemency.

@Jinmo
Copy link
Author

Jinmo commented Aug 7, 2017

image

IDA processor module is independently developed with compatibility to the disassembler.

@Jinmo
Copy link
Author

Jinmo commented Aug 7, 2017

image

NDH example

@Jinmo
Copy link
Author

Jinmo commented Aug 7, 2017

image
... and clemency one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment