Skip to content

Instantly share code, notes, and snippets.

@alexander-hanel
Last active July 13, 2023 20:56
Show Gist options
  • Save alexander-hanel/2dcfd8d687acbe6384788d13163ebf90 to your computer and use it in GitHub Desktop.
Save alexander-hanel/2dcfd8d687acbe6384788d13163ebf90 to your computer and use it in GitHub Desktop.

Disassembler (aka Task 1)

Notes on RE1.

  1. Use a language of your choice to decode the base64 encoded data, disassemble the binary data using the capstone engine and save the text to a file named disassemble.txt
import base64
from capstone import *
from capstone.x86 import *

base64_encoded = 'Mclki0Ewi0AMi0AUiwQIiwQIi1gQiV38aGvQK8r/dfzoFgAAAGoAagBqAGjQMEwAagBqAP/Q6VkAAABgi2wkJItFPItUBXgB6otKGItaIAHr4zRJizSLAe4x/zHA/KyEwHQHwc8NAcfr9Dt8JCh14YtaJAHrZosMS4taHAHriwSLAeiJRCQcYcM='
base64_img_bytes = base64_encoded.encode('utf-8')
CODE = base64.b64decode(base64_img_bytes)

output = ""

md = Cs(CS_ARCH_X86, CS_MODE_32)
for i in md.disasm(CODE, 0x1000):
    output += "0x%x:\t%s\t%s\n" %(i.address, i.mnemonic, i.op_str)


out_file = open("disassemble.txt", "w")
out_file.write(output)
out_file.close()

Output

0x1000:	xor	ecx, ecx
0x1002:	mov	eax, dword ptr fs:[ecx + 0x30]
0x1006:	mov	eax, dword ptr [eax + 0xc]
0x1009:	mov	eax, dword ptr [eax + 0x14]
0x100c:	mov	eax, dword ptr [eax + ecx]
0x100f:	mov	eax, dword ptr [eax + ecx]
0x1012:	mov	ebx, dword ptr [eax + 0x10]
0x1015:	mov	dword ptr [ebp - 4], ebx
0x1018:	push	0xca2bd06b
0x101d:	push	dword ptr [ebp - 4]
0x1020:	call	0x103b
0x1025:	push	0
0x1027:	push	0
0x1029:	push	0
0x102b:	push	0x4c30d0
0x1030:	push	0
0x1032:	push	0
0x1034:	call	eax
0x1036:	jmp	0x1094
0x103b:	pushal
0x103c:	mov	ebp, dword ptr [esp + 0x24]
0x1040:	mov	eax, dword ptr [ebp + 0x3c]
0x1043:	mov	edx, dword ptr [ebp + eax + 0x78]
0x1047:	add	edx, ebp
0x1049:	mov	ecx, dword ptr [edx + 0x18]
0x104c:	mov	ebx, dword ptr [edx + 0x20]
0x104f:	add	ebx, ebp
0x1051:	jecxz	0x1087
0x1053:	dec	ecx
0x1054:	mov	esi, dword ptr [ebx + ecx*4]
0x1057:	add	esi, ebp
0x1059:	xor	edi, edi
0x105b:	xor	eax, eax
0x105d:	cld
0x105e:	lodsb	al, byte ptr [esi]
0x105f:	test	al, al
0x1061:	je	0x106a
0x1063:	ror	edi, 0xd
0x1066:	add	edi, eax
0x1068:	jmp	0x105e
0x106a:	cmp	edi, dword ptr [esp + 0x28]
0x106e:	jne	0x1051
0x1070:	mov	ebx, dword ptr [edx + 0x24]
0x1073:	add	ebx, ebp
0x1075:	mov	cx, word ptr [ebx + ecx*2]
0x1079:	mov	ebx, dword ptr [edx + 0x1c]
0x107c:	add	ebx, ebp
0x107e:	mov	eax, dword ptr [ebx + ecx*4]
0x1081:	add	eax, ebp
0x1083:	mov	dword ptr [esp + 0x1c], eax
0x1087:	popal
0x1088:	ret

Disassembler Notes

The algorithm used to disassemble the instructions is called linear sweep. It uses a top-down approach to disassemble the instructions. It assumes that all the bytes are instructions. objdump uses this algorithm to disassemble instructions. Notice in the following Python code it loops and print each instruction without validating or checking if the instructions are valid.

for i in md.disasm(CODE, 0x1000):
    output += "0x%x:\t%s\t%s\n" %(i.address, i.mnemonic, i.op_str)

Linear sweep works for simple instructions but can be easily broken if the code becomes complex or if there is data mixed in with the instruction bytes. In the following examples don't worry about understanding the assembly language but take note of the control flow (jumps).

Example 1

The following is an example of an instruction jumping into another instruction. At offset 0x1013, if the registers ax and bx are equal it jumps to offset 0x100e which is in the middle of the instruction 0x100c.

>>> from capstone import *
>>> md = Cs(CS_ARCH_X86, CS_MODE_32)
>>> cc = "B8CCCCCCCCF3AB6066B8654B66BB664B6639D875F9618BF4"
>>> dd = bytearray.fromhex(cc)
>>> for i in md.disasm(dd, 0x1000):
...     print("0x%x:\t%s\t%s" %(i.address, i.mnemonic, i.op_str))
...
0x1000:	mov	eax, 0xcccccccc
0x1005:	rep stosd	dword ptr es:[edi], eax
0x1007:	pushal
0x1008:	mov	ax, 0x4b65
0x100c:	mov	bx, 0x4b66
0x1010:	cmp	ax, bx ; if ax - bx == 0: ZF = 1,
0x1013:	jne	0x100e ; if ZF: Jump to 0x100e
0x1015:	popal
0x1016:	mov	esi, esp
>>>

This example is negligible because the jump never occurs. This type of obfuscation could be referred to as opaque predicate.

Example 2

The following example the assembly looks kind of like junk code following the call at 0x1000. Afer the call instruction is a string of ntdll.dll that was erroneously disassembled.

Note: I do understand as a beginner the term "junk" might be confusing. After a while you will start to recognize common assembly instructions or patterns and note when things look off.

>>> cc = "E80A0000006E74646C6C2E646C6C00FF15D8204000"
>>> dd = bytearray.fromhex(cc)
>>> for i in md.disasm(dd, 0x1000):
...     print("0x%x:\t%s\t%s" %(i.address, i.mnemonic, i.op_str))
...
0x1000:	call	0x100f
0x1005:	outsb	dx, byte ptr [esi]
0x1006:	je	0x106c
0x1008:	insb	byte ptr es:[edi], dx
0x1009:	insb	byte ptr es:[edi], dx
0x100a:	insb	byte ptr es:[edi], dx
0x100d:	insb	byte ptr es:[edi], dx
0x100e:	add	bh, bh
0x1010:	adc	eax, 0x4020d8
>>> import hexdump
>>> hexdump.hexdump(dd)
00000000: E8 0A 00 00 00 6E 74 64  6C 6C 2E 64 6C 6C 00 FF  .....ntdll.dll..
00000010: 15 D8 20 40 00                                    .. @.

As previously demonstrated linear sweep does have some issues. Another approach for disassembling data is using recursive descent. A combination of linear and recursive is used by most established disassemblers (IDA, Ghidra, etc). A great read on disassembling algorithms is Disassembly of Executable Code Revisited�.

Instruction Review (aka Task 2)

  1. Add a detailed comment for each line of instructions in disassemble.txt. Links have been provided to instructions that have not been yet observed.
Address Instruction Description
0x1000 xor ecx, ecx set ecx to 0; [ref]
0x1002 mov eax, dword ptr fs:[ecx + 0x30] Get offset of PEB. See Notes Address 0x1002
0x1006 mov eax, dword ptr [eax + 0xc] Get offset to PEB.LDR. See Notes Address 0x1006
0x1009 mov eax, dword ptr [eax + 0x14] Get offset to LDR.InMemoryOrderModuleList. Offset to first Flink. See Notes Address 0x1009.
0x100c mov eax, dword ptr [eax + ecx] Get second entry in LDR.InMemoryOrderModuleList. See Notes Address 0x100c & 0x100f
0x100f mov eax, dword ptr [eax + ecx] Get second entry in LDR.InMemoryOrderModuleList. See Notes Address 0x100c & 0x100f
0x1012 mov ebx, dword ptr [eax + 0x10] Get base address of KERNEL32.dll; See Notes Address 0x1015
0x1015 mov dword ptr [ebp - 4], ebx save base address of KERNEL32.dll
0x1018 push 0xca2bd06b push DWORD on to the stack
0x101d push dword ptr [ebp - 4] push offset pointer to base address on to stack
0x1020 call 0x103b call function 0x103b
0x1025 push 0 push NULL
0x1027 push 0 push NULL
0x1029 push 0 push NULL
0x102b push 0x4c30d0 push offset.
0x1030 push 0 push NULL
0x1032 push 0 push NULL
0x1034 call eax EAX is equal to the return of the call to 0x103b
0x1036 jmp 0x1094 JMP to address with unknown data.
0x103b pushal start of called function. push/save all registers on to the stack
0x103c mov ebp, dword ptr [esp + 0x24] move base address into ebp
0x1040 mov eax, dword ptr [ebp + 0x3c] eax is equal to _IMAGE_DOS_HEADER.e_lfanew. See Notes Address 0x1040
0x1043 mov edx, dword ptr [ebp + eax + 0x78] edx is equal to RVA of export table
0x1047 add edx, ebp EDX is equal to the base address + the RVA of the export table. EDX is the absolute address
0x1049 mov ecx, dword ptr [edx + 0x18] ecx is IMAGE_EXPORT_DIRECTORY.NumberOfNames. See Notes on Address 0x1049
0x104c mov ebx, dword ptr [edx + 0x20] ebx is equal to IMAGE_EXPORT_DIRECTORY.AddressOfNames. See IMAGE_EXPORT_DIRECTORY Structure
0x104f add ebx, ebp ebx is equal to the absolute address of the address of Export names
0x1051📌 jecxz 0x1087 start of loop 📌
0x1053 dec ecx decrement ecx which contains the number of exported functions IMAGE_EXPORT_DIRECTORY.NumberOfNames
0x1054 mov esi, dword ptr [ebx + ecx*4] esi equal to RVA of exported function name
0x1057 add esi, ebp esi is equal to absolute address of function name
0x1059 xor edi, edi set edi to 0
0x105b xor eax, eax set eax to 0
0x105d cld clear the direction flag (DF)
0x105e🛳️ lodsb al, byte ptr [esi] read single byte from the function name
0x105f test al, al test if byte is \x00''
0x1061 je 0x106a🏵️ jump if read byte in al is equal to \x00
0x1063 ror edi, 0xd
0x1066 add edi, eax add read byte with ROR data
0x1068 jmp 0x105e🛳️ loop through each byte
0x106a🏵️ cmp edi, dword ptr [esp + 0x28] compare ROL hashed bytes against 0xca2bd06b
0x106e jne 0x1051 📌 jump if ROL hash and 0xca2bd06b not equal
0x1070 mov ebx, dword ptr [edx + 0x24] ebx is equal to IMAGE_EXPORT_DIRECTORY.AddressOfNameOrdinals of name that matches hash
0x1073 add ebx, ebp ebx is equal to the absolute ordinal address
0x1075 mov cx, word ptr [ebx + ecx*2] cx is equal to the ordinal from array of IMAGE_EXPORT_DIRECTORY.AdressOfNameOrdinals
0x1079 mov ebx, dword ptr [edx + 0x1c] ebx is equal to the table offset
0x107c add ebx, ebp ebx is the absolute value
0x107e mov eax, dword ptr [ebx + ecx*4] eax is equal to RVA of hashed exported function based on ordinal
0x1081 add eax, ebp eax is the absolute virtual address
0x1083 mov dword ptr [esp + 0x1c], eax over write the stack with the address with address of hashed function match
0x1087 popal
0x1088 ret return to instruction following 0x1020

Reimplment the Hash Algorithm (aka Task 2)

  1. Reimplement the hash algorithm in a language of your choice. A test case is the string TraceLoggingRegister should equal 0x6F90986.
def ror32(val, amt):
    return ((val >> amt) & 0xffffffff) | ((val << (32 - amt)) & 0xffffffff)
def add32(val, amt):
    return (val + amt) & 0xffffffff


def z0mbie_hash(name):
    hash = 0
    for char in name:
        hash = add32(ror32(hash, 13), ord(char) & 0xff)
    return hash

print hex(z0mbie_hash("TraceLoggingRegister"))
print hex(z0mbie_hash("CreateThread"))

Output

0x6f90986L
0xca2bd06bL

When porting assembly to Python or another language the size of the registers must be accounted for. The recurring AND 0xFFFFFFFF throughout the code ensures that the value is never larger than a DWORD. If the code was 64bit the AND value would be 0xFFFFFFFFFFFFFFFF. For individuals writing code in Python, ctypes can be used to specify the size.

Structures Used (aka Task 4)

TEB, PEB, Portable Executable, LDR, etc .....

As a beginner, these structures can be difficult to figure out. They were included because once understood them, they never have to be reanalyzed again. For example, next time you see the offset 0x3c followed by 0x78 you will identify that they are traversing the Portable Executable file format. From there you can likely infer what the code is doing or likely allow you to pivot quickly. These structures are used by lots of position independent code and packers.

NOTES

LDR NOTES

Notes Address 0x1002

mov eax, dword ptr fs:[] moves the offset of FS into eax. On x86 Windows the segment register FS is a pointer to the thread information block (TIB) aka (TEB). The TIB is a structure that can be displayed using windbg. A great visual representation of the structure can be observed using terminus. Notice that at 0x30 is the Process Environment Block (PEB).

0:000> dt ntdll!_TEB
   +0x000 NtTib            : _NT_TIB
   +0x01c EnvironmentPointer : Ptr32 Void
   +0x020 ClientId         : _CLIENT_ID
   +0x028 ActiveRpcHandle  : Ptr32 Void
   +0x02c ThreadLocalStoragePointer : Ptr32 Void
   +0x030 ProcessEnvironmentBlock : Ptr32 _PEB
   +0x034 LastErrorValue   : Uint4B
   +0x038 CountOfOwnedCriticalSections : Uint4B

graph TD

    FS --> C{thread information block}
    C -->|0x000| D[NtTib]
    C -->|0x01c| E[EnvironmentPointer]
    C -->|0x020| F[ClientId]
    C -->|0x028| G[ActiveRpcHandle]
    C -->|0x02c| H[ThreadLocalStoragePointer]
    C -->|0x030|I[ProcessEnvironmentBlock]
Loading

mov eax, dword ptr fs:[ecx + 0x30] retrieves a pointer to the PEB via the TEB. The following graph is

Notes Address 0x1006

mov eax, dword ptr [eax + 0xc] eax is the start address of the PEB. Within the PEB structure 0xC is the Ldr (PEB_LDR_DATA structure). EAX is a pointer to Ldr.

0:000> dt ntdll!_PEB
   +0x000 InheritedAddressSpace : UChar
   +0x001 ReadImageFileExecOptions : UChar
   +0x002 BeingDebugged    : UChar
   +0x003 BitField         : UChar
   +0x003 ImageUsesLargePages : Pos 0, 1 Bit
   +0x003 IsProtectedProcess : Pos 1, 1 Bit
   +0x003 IsImageDynamicallyRelocated : Pos 2, 1 Bit
   +0x003 SkipPatchingUser32Forwarders : Pos 3, 1 Bit
   +0x003 IsPackagedProcess : Pos 4, 1 Bit
   +0x003 IsAppContainer   : Pos 5, 1 Bit
   +0x003 IsProtectedProcessLight : Pos 6, 1 Bit
   +0x003 IsLongPathAwareProcess : Pos 7, 1 Bit
   +0x004 Mutant           : Ptr32 Void
   +0x008 ImageBaseAddress : Ptr32 Void
   +0x00c Ldr              : Ptr32 _PEB_LDR_DATA   <- EAX
   +0x010 ProcessParameters : Ptr32 _RTL_USER_PROCESS_PARAMETERS

Notes Address 0x1009

mov eax, dword ptr [eax + 0x14]

0:000> dt _PEB_LDR_DATA
ntdll!_PEB_LDR_DATA
   +0x000 Length           : Uint4B
   +0x004 Initialized      : UChar
   +0x008 SsHandle         : Ptr32 Void
   +0x00c InLoadOrderModuleList : _LIST_ENTRY
   +0x014 InMemoryOrderModuleList : _LIST_ENTRY    <- EAX
   +0x01c InInitializationOrderModuleList : _LIST_ENTRY
   +0x024 EntryInProgress  : Ptr32 Void
   +0x028 ShutdownInProgress : UChar
   +0x02c ShutdownThreadId : Ptr32 Void

EAX is pointing to the first Flink in the InMemoryOrderModuleList list

0:000> dt _LIST_ENTRY
ntdll!_LIST_ENTRY
   +0x000 Flink            : Ptr32 _LIST_ENTRY  <- EAX
   +0x004 Blink            : Ptr32 _LIST_ENTRY

Notes Address 0x100c & 0x100f

0x1009 mov eax, dword ptr [eax + 0x14] EAX is pointing to the first entry in the LIST Entry. Dumping the PEB gives a pretty view of what the linked list looks like via the sub-section of Ldr.InMemoryOrderModuleList: 010533f0 . 0105d710 in the following output.

0:000> !peb
PEB at 00df5000
    InheritedAddressSpace:    No
    ReadImageFileExecOptions: No
    BeingDebugged:            Yes
    ImageBaseAddress:         00e10000
    Ldr                       773d5d80
    Ldr.Initialized:          Yes
    Ldr.InInitializationOrderModuleList: 010532f0 . 010548d8
    Ldr.InLoadOrderModuleList:           010533e8 . 0105d708
    Ldr.InMemoryOrderModuleList:         010533f0 . 0105d710
            Base TimeStamp                     Module
          e10000 6139342d Sep 08 16:07:41 2021 C:\Users\this\Documents\projects\antidis\Debug\antidis.exe ; <- 0x1009	mov eax, dword ptr [eax + 0x14]
        772b0000 221456c9 Feb 13 06:44:41 1988 C:\WINDOWS\SYSTEM32\ntdll.dll    ; <- 0x100c mov eax, dword ptr [eax + ecx]
        75310000 5628abf5 Oct 22 03:27:17 2015 C:\WINDOWS\System32\KERNEL32.DLL  ; <- 0x100f mov eax, dword ptr [eax + ecx]
        768f0000 270baf18 Oct 04 15:52:24 1990 C:\WINDOWS\System32\KERNELBASE.dll
        74b00000 1a613b1c Jan 10 03:49:00 1984 C:\WINDOWS\SYSTEM32\apphelp.dll
        66320000 4ba1dbd4 Mar 18 01:52:52 2010 C:\WINDOWS\SYSTEM32\MSVCR100D.dll
    SubSystemData:     00000000
    ProcessHeap:       01050000

KERNEL32.DLL is the third entry InMemoryOrderModuleList in the above output. The pointer is currently at the first, executing mov eax, dword ptr [eax + ecx] twice has EAX pointing to the third entry in the Ldr.InMemoryOrderModuleList.

Notes on the different Lists

  • InInitializationOrderModuleList: Order DLLs were loaded into memory by the Windows Loader
  • InLoadOrderModuleList: Order DLLs exist in memory layout
  • InMemoryOrderModuleList: Order DLLs were initilized

Source

Notes Address 0x1015

mov ebx, dword ptr [eax + 0x10] EAX is currently pointing to InMemoryOrderLinks at offset 0x8. The base address is stored at 0x018

0:000> dt _LDR_DATA_TABLE_ENTRY
ntdll!_LDR_DATA_TABLE_ENTRY
   +0x000 InLoadOrderLinks : _LIST_ENTRY
   +0x008 InMemoryOrderLinks : _LIST_ENTRY  <- EAX is pointing to a linked list
   +0x010 InInitializationOrderLinks : _LIST_ENTRY
   +0x018 DllBase          : Ptr32 Void
   +0x01c EntryPoint       : Ptr32 Void

Portable Executable Structure NOTES

Notes Address 0x1040

ntdll!_IMAGE_DOS_HEADER
   +0x000 e_magic          : Uint2B
   +0x002 e_cblp           : Uint2B
   +0x004 e_cp             : Uint2B
   +0x006 e_crlc           : Uint2B
   +0x008 e_cparhdr        : Uint2B
   +0x00a e_minalloc       : Uint2B
   +0x00c e_maxalloc       : Uint2B
   +0x00e e_ss             : Uint2B
   +0x010 e_sp             : Uint2B
   +0x012 e_csum           : Uint2B
   +0x014 e_ip             : Uint2B
   +0x016 e_cs             : Uint2B
   +0x018 e_lfarlc         : Uint2B
   +0x01a e_ovno           : Uint2B
   +0x01c e_res            : [4] Uint2B
   +0x024 e_oemid          : Uint2B
   +0x026 e_oeminfo        : Uint2B
   +0x028 e_res2           : [10] Uint2B
   +0x03c e_lfanew         : Int4B

Hex Dump of the header of an executable. The first byte is e_magic.

Offset(h) 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F

00000000  4D 5A 90 00 03 00 00 00 04 00 00 00 FF FF 00 00  MZ..........ÿÿ..
00000010  B8 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00  ¸.......@.......
00000020  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00000030  00 00 00 00 00 00 00 00 00 00 00 00 E8 00 00 00  ............è...   <- E8 is 0x03c e_lfanew  
00000040  0E 1F BA 0E 00 B4 09 CD 21 B8 01 4C CD 21 54 68  ..º..´.Í!¸.LÍ!Th
00000050  69 73 20 70 72 6F 67 72 61 6D 20 63 61 6E 6E 6F  is program canno
00000060  74 20 62 65 20 72 75 6E 20 69 6E 20 44 4F 53 20  t be run in DOS
00000070  6D 6F 64 65 2E 0D 0D 0A 24 00 00 00 00 00 00 00  mode....$.......
00000080  88 BF 4D C8 CC DE 23 9B CC DE 23 9B CC DE 23 9B  ˆ¿MÈÌÞ#›ÌÞ#›ÌÞ#›
00000090  A3 A8 BF 9B CF DE 23 9B A3 A8 88 9B CB DE 23 9B  £¨¿›ÏÞ#›£¨ˆ›ËÞ#›
000000A0  C5 A6 B0 9B CE DE 23 9B CC DE 22 9B FB DE 23 9B  Ŧ°›ÎÞ#›ÌÞ"›ûÞ#›
000000B0  A3 A8 89 9B DF DE 23 9B A3 A8 B9 9B CD DE 23 9B  £¨‰›ßÞ#›£¨¹›ÍÞ#›
000000C0  A3 A8 BE 9B CD DE 23 9B 52 69 63 68 CC DE 23 9B  £¨¾›ÍÞ#›RichÌÞ#›
000000D0  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
000000E0  00 00 00 00 00 00 00 00 50 45 00 00 4C 01 07 00  ........PE..L...

IMAGE_EXPORT_DIRECTORY Structure

Windbg did not contain the structure

Notes on Address 0x1049

Private Type IMAGE_EXPORT_DIRECTORY
 Characteristics As Long
 TimeDateStamp As Long
 MajorVersion As Integer
 MinorVersion As Integer
 lpName As Long
 Base As Long
 NumberOfFunctions As Long
 NumberOfNames As Long
 lpAddressOfFunctions As Long
 lpAddressOfNames As Long
 lpAddressOfNameOrdinals As Long
End Type

Links

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment