Notes on RE1.
- Use a language of your choice to decode the base64 encoded data, disassemble the binary data using the capstone engine and save the text to a file named disassemble.txt
import base64
from capstone import *
from capstone.x86 import *
base64_encoded = 'Mclki0Ewi0AMi0AUiwQIiwQIi1gQiV38aGvQK8r/dfzoFgAAAGoAagBqAGjQMEwAagBqAP/Q6VkAAABgi2wkJItFPItUBXgB6otKGItaIAHr4zRJizSLAe4x/zHA/KyEwHQHwc8NAcfr9Dt8JCh14YtaJAHrZosMS4taHAHriwSLAeiJRCQcYcM='
base64_img_bytes = base64_encoded.encode('utf-8')
CODE = base64.b64decode(base64_img_bytes)
output = ""
md = Cs(CS_ARCH_X86, CS_MODE_32)
for i in md.disasm(CODE, 0x1000):
output += "0x%x:\t%s\t%s\n" %(i.address, i.mnemonic, i.op_str)
out_file = open("disassemble.txt", "w")
out_file.write(output)
out_file.close()
0x1000: xor ecx, ecx
0x1002: mov eax, dword ptr fs:[ecx + 0x30]
0x1006: mov eax, dword ptr [eax + 0xc]
0x1009: mov eax, dword ptr [eax + 0x14]
0x100c: mov eax, dword ptr [eax + ecx]
0x100f: mov eax, dword ptr [eax + ecx]
0x1012: mov ebx, dword ptr [eax + 0x10]
0x1015: mov dword ptr [ebp - 4], ebx
0x1018: push 0xca2bd06b
0x101d: push dword ptr [ebp - 4]
0x1020: call 0x103b
0x1025: push 0
0x1027: push 0
0x1029: push 0
0x102b: push 0x4c30d0
0x1030: push 0
0x1032: push 0
0x1034: call eax
0x1036: jmp 0x1094
0x103b: pushal
0x103c: mov ebp, dword ptr [esp + 0x24]
0x1040: mov eax, dword ptr [ebp + 0x3c]
0x1043: mov edx, dword ptr [ebp + eax + 0x78]
0x1047: add edx, ebp
0x1049: mov ecx, dword ptr [edx + 0x18]
0x104c: mov ebx, dword ptr [edx + 0x20]
0x104f: add ebx, ebp
0x1051: jecxz 0x1087
0x1053: dec ecx
0x1054: mov esi, dword ptr [ebx + ecx*4]
0x1057: add esi, ebp
0x1059: xor edi, edi
0x105b: xor eax, eax
0x105d: cld
0x105e: lodsb al, byte ptr [esi]
0x105f: test al, al
0x1061: je 0x106a
0x1063: ror edi, 0xd
0x1066: add edi, eax
0x1068: jmp 0x105e
0x106a: cmp edi, dword ptr [esp + 0x28]
0x106e: jne 0x1051
0x1070: mov ebx, dword ptr [edx + 0x24]
0x1073: add ebx, ebp
0x1075: mov cx, word ptr [ebx + ecx*2]
0x1079: mov ebx, dword ptr [edx + 0x1c]
0x107c: add ebx, ebp
0x107e: mov eax, dword ptr [ebx + ecx*4]
0x1081: add eax, ebp
0x1083: mov dword ptr [esp + 0x1c], eax
0x1087: popal
0x1088: ret
The algorithm used to disassemble the instructions is called linear sweep. It uses a top-down approach
to disassemble the instructions. It assumes that all the bytes are instructions. objdump
uses this algorithm to disassemble instructions. Notice in the following Python code it loops and print each instruction without validating or checking if the instructions are valid.
for i in md.disasm(CODE, 0x1000):
output += "0x%x:\t%s\t%s\n" %(i.address, i.mnemonic, i.op_str)
Linear sweep works for simple instructions but can be easily broken if the code becomes complex or if there is data mixed in with the instruction bytes. In the following examples don't worry about understanding the assembly language but take note of the control flow (jumps).
The following is an example of an instruction jumping into another instruction. At offset 0x1013
, if the registers ax and bx are equal it jumps to offset 0x100e
which is in the middle of the instruction 0x100c
.
>>> from capstone import *
>>> md = Cs(CS_ARCH_X86, CS_MODE_32)
>>> cc = "B8CCCCCCCCF3AB6066B8654B66BB664B6639D875F9618BF4"
>>> dd = bytearray.fromhex(cc)
>>> for i in md.disasm(dd, 0x1000):
... print("0x%x:\t%s\t%s" %(i.address, i.mnemonic, i.op_str))
...
0x1000: mov eax, 0xcccccccc
0x1005: rep stosd dword ptr es:[edi], eax
0x1007: pushal
0x1008: mov ax, 0x4b65
0x100c: mov bx, 0x4b66
0x1010: cmp ax, bx ; if ax - bx == 0: ZF = 1,
0x1013: jne 0x100e ; if ZF: Jump to 0x100e
0x1015: popal
0x1016: mov esi, esp
>>>
This example is negligible because the jump never occurs. This type of obfuscation could be referred to as opaque predicate.
The following example the assembly looks kind of like junk code following the call at 0x1000
. Afer the call instruction is a string of ntdll.dll
that was erroneously disassembled.
Note: I do understand as a beginner the term "junk" might be confusing. After a while you will start to recognize common assembly instructions or patterns and note when things look off.
>>> cc = "E80A0000006E74646C6C2E646C6C00FF15D8204000"
>>> dd = bytearray.fromhex(cc)
>>> for i in md.disasm(dd, 0x1000):
... print("0x%x:\t%s\t%s" %(i.address, i.mnemonic, i.op_str))
...
0x1000: call 0x100f
0x1005: outsb dx, byte ptr [esi]
0x1006: je 0x106c
0x1008: insb byte ptr es:[edi], dx
0x1009: insb byte ptr es:[edi], dx
0x100a: insb byte ptr es:[edi], dx
0x100d: insb byte ptr es:[edi], dx
0x100e: add bh, bh
0x1010: adc eax, 0x4020d8
>>> import hexdump
>>> hexdump.hexdump(dd)
00000000: E8 0A 00 00 00 6E 74 64 6C 6C 2E 64 6C 6C 00 FF .....ntdll.dll..
00000010: 15 D8 20 40 00 .. @.
As previously demonstrated linear sweep does have some issues. Another approach for disassembling data is using recursive descent. A combination of linear and recursive is used by most established disassemblers (IDA, Ghidra, etc). A great read on disassembling algorithms is Disassembly of Executable Code Revisited�.
- Add a detailed comment for each line of instructions in
disassemble.txt
. Links have been provided to instructions that have not been yet observed.
Address | Instruction | Description |
---|---|---|
0x1000 | xor ecx, ecx | set ecx to 0; [ref] |
0x1002 | mov eax, dword ptr fs:[ecx + 0x30] | Get offset of PEB . See Notes Address 0x1002 |
0x1006 | mov eax, dword ptr [eax + 0xc] | Get offset to PEB.LDR . See Notes Address 0x1006 |
0x1009 | mov eax, dword ptr [eax + 0x14] | Get offset to LDR.InMemoryOrderModuleList . Offset to first Flink . See Notes Address 0x1009. |
0x100c | mov eax, dword ptr [eax + ecx] | Get second entry in LDR.InMemoryOrderModuleList . See Notes Address 0x100c & 0x100f |
0x100f | mov eax, dword ptr [eax + ecx] | Get second entry in LDR.InMemoryOrderModuleList . See Notes Address 0x100c & 0x100f |
0x1012 | mov ebx, dword ptr [eax + 0x10] | Get base address of KERNEL32.dll ; See Notes Address 0x1015 |
0x1015 | mov dword ptr [ebp - 4], ebx | save base address of KERNEL32.dll |
0x1018 | push 0xca2bd06b | push DWORD on to the stack |
0x101d | push dword ptr [ebp - 4] | push offset pointer to base address on to stack |
0x1020 | call 0x103b | call function 0x103b |
0x1025 | push 0 | push NULL |
0x1027 | push 0 | push NULL |
0x1029 | push 0 | push NULL |
0x102b | push 0x4c30d0 | push offset. |
0x1030 | push 0 | push NULL |
0x1032 | push 0 | push NULL |
0x1034 | call eax | EAX is equal to the return of the call to 0x103b |
0x1036 | jmp 0x1094 | JMP to address with unknown data. |
0x103b | pushal | start of called function. push/save all registers on to the stack |
0x103c | mov ebp, dword ptr [esp + 0x24] | move base address into ebp |
0x1040 | mov eax, dword ptr [ebp + 0x3c] | eax is equal to _IMAGE_DOS_HEADER.e_lfanew . See Notes Address 0x1040 |
0x1043 | mov edx, dword ptr [ebp + eax + 0x78] | edx is equal to RVA of export table |
0x1047 | add edx, ebp | EDX is equal to the base address + the RVA of the export table. EDX is the absolute address |
0x1049 | mov ecx, dword ptr [edx + 0x18] | ecx is IMAGE_EXPORT_DIRECTORY.NumberOfNames . See Notes on Address 0x1049 |
0x104c | mov ebx, dword ptr [edx + 0x20] | ebx is equal to IMAGE_EXPORT_DIRECTORY.AddressOfNames . See IMAGE_EXPORT_DIRECTORY Structure |
0x104f | add ebx, ebp | ebx is equal to the absolute address of the address of Export names |
0x1051📌 | jecxz 0x1087 | start of loop 📌 |
0x1053 | dec ecx | decrement ecx which contains the number of exported functions IMAGE_EXPORT_DIRECTORY.NumberOfNames |
0x1054 | mov esi, dword ptr [ebx + ecx*4] | esi equal to RVA of exported function name |
0x1057 | add esi, ebp | esi is equal to absolute address of function name |
0x1059 | xor edi, edi | set edi to 0 |
0x105b | xor eax, eax | set eax to 0 |
0x105d | cld | clear the direction flag (DF) |
0x105e🛳️ | lodsb al, byte ptr [esi] | read single byte from the function name |
0x105f | test al, al | test if byte is \x00 '' |
0x1061 | je 0x106a🏵️ | jump if read byte in al is equal to \x00 |
0x1063 | ror edi, 0xd | |
0x1066 | add edi, eax | add read byte with ROR data |
0x1068 | jmp 0x105e🛳️ | loop through each byte |
0x106a🏵️ | cmp edi, dword ptr [esp + 0x28] | compare ROL hashed bytes against 0xca2bd06b |
0x106e | jne 0x1051 📌 | jump if ROL hash and 0xca2bd06b not equal |
0x1070 | mov ebx, dword ptr [edx + 0x24] | ebx is equal to IMAGE_EXPORT_DIRECTORY.AddressOfNameOrdinals of name that matches hash |
0x1073 | add ebx, ebp | ebx is equal to the absolute ordinal address |
0x1075 | mov cx, word ptr [ebx + ecx*2] | cx is equal to the ordinal from array of IMAGE_EXPORT_DIRECTORY.AdressOfNameOrdinals |
0x1079 | mov ebx, dword ptr [edx + 0x1c] | ebx is equal to the table offset |
0x107c | add ebx, ebp | ebx is the absolute value |
0x107e | mov eax, dword ptr [ebx + ecx*4] | eax is equal to RVA of hashed exported function based on ordinal |
0x1081 | add eax, ebp | eax is the absolute virtual address |
0x1083 | mov dword ptr [esp + 0x1c], eax | over write the stack with the address with address of hashed function match |
0x1087 | popal | |
0x1088 | ret | return to instruction following 0x1020 |
- Reimplement the hash algorithm in a language of your choice. A test case is the string TraceLoggingRegister should equal 0x6F90986.
def ror32(val, amt):
return ((val >> amt) & 0xffffffff) | ((val << (32 - amt)) & 0xffffffff)
def add32(val, amt):
return (val + amt) & 0xffffffff
def z0mbie_hash(name):
hash = 0
for char in name:
hash = add32(ror32(hash, 13), ord(char) & 0xff)
return hash
print hex(z0mbie_hash("TraceLoggingRegister"))
print hex(z0mbie_hash("CreateThread"))
Output
0x6f90986L
0xca2bd06bL
When porting assembly to Python or another language the size of the registers must be accounted for. The recurring AND 0xFFFFFFFF
throughout the code ensures that the value is never larger than a DWORD. If the code was 64bit the AND value would be 0xFFFFFFFFFFFFFFFF
. For individuals writing code in Python, ctypes can be used to specify the size.
TEB, PEB, Portable Executable, LDR, etc .....
As a beginner, these structures can be difficult to figure out. They were included because once understood them, they never have to be reanalyzed again. For example, next time you see the offset 0x3c
followed by 0x78
you will identify that they are traversing the Portable Executable file format. From there you can likely infer what the code is doing or likely allow you to pivot quickly. These structures are used by lots of position independent code and packers.
mov eax, dword ptr fs:[]
moves the offset of FS into eax. On x86 Windows the segment register FS is a pointer to the thread information block (TIB) aka (TEB). The TIB is a structure that can be displayed using windbg. A great visual representation of the structure can be observed using terminus. Notice that at 0x30 is the Process Environment Block (PEB).
0:000> dt ntdll!_TEB
+0x000 NtTib : _NT_TIB
+0x01c EnvironmentPointer : Ptr32 Void
+0x020 ClientId : _CLIENT_ID
+0x028 ActiveRpcHandle : Ptr32 Void
+0x02c ThreadLocalStoragePointer : Ptr32 Void
+0x030 ProcessEnvironmentBlock : Ptr32 _PEB
+0x034 LastErrorValue : Uint4B
+0x038 CountOfOwnedCriticalSections : Uint4B
graph TD
FS --> C{thread information block}
C -->|0x000| D[NtTib]
C -->|0x01c| E[EnvironmentPointer]
C -->|0x020| F[ClientId]
C -->|0x028| G[ActiveRpcHandle]
C -->|0x02c| H[ThreadLocalStoragePointer]
C -->|0x030|I[ProcessEnvironmentBlock]
mov eax, dword ptr fs:[ecx + 0x30]
retrieves a pointer to the PEB via the TEB. The following graph is
mov eax, dword ptr [eax + 0xc]
eax is the start address of the PEB. Within the PEB structure 0xC is the Ldr (PEB_LDR_DATA structure). EAX is a pointer to Ldr.
0:000> dt ntdll!_PEB
+0x000 InheritedAddressSpace : UChar
+0x001 ReadImageFileExecOptions : UChar
+0x002 BeingDebugged : UChar
+0x003 BitField : UChar
+0x003 ImageUsesLargePages : Pos 0, 1 Bit
+0x003 IsProtectedProcess : Pos 1, 1 Bit
+0x003 IsImageDynamicallyRelocated : Pos 2, 1 Bit
+0x003 SkipPatchingUser32Forwarders : Pos 3, 1 Bit
+0x003 IsPackagedProcess : Pos 4, 1 Bit
+0x003 IsAppContainer : Pos 5, 1 Bit
+0x003 IsProtectedProcessLight : Pos 6, 1 Bit
+0x003 IsLongPathAwareProcess : Pos 7, 1 Bit
+0x004 Mutant : Ptr32 Void
+0x008 ImageBaseAddress : Ptr32 Void
+0x00c Ldr : Ptr32 _PEB_LDR_DATA <- EAX
+0x010 ProcessParameters : Ptr32 _RTL_USER_PROCESS_PARAMETERS
mov eax, dword ptr [eax + 0x14]
0:000> dt _PEB_LDR_DATA
ntdll!_PEB_LDR_DATA
+0x000 Length : Uint4B
+0x004 Initialized : UChar
+0x008 SsHandle : Ptr32 Void
+0x00c InLoadOrderModuleList : _LIST_ENTRY
+0x014 InMemoryOrderModuleList : _LIST_ENTRY <- EAX
+0x01c InInitializationOrderModuleList : _LIST_ENTRY
+0x024 EntryInProgress : Ptr32 Void
+0x028 ShutdownInProgress : UChar
+0x02c ShutdownThreadId : Ptr32 Void
EAX is pointing to the first Flink in the InMemoryOrderModuleList list
0:000> dt _LIST_ENTRY
ntdll!_LIST_ENTRY
+0x000 Flink : Ptr32 _LIST_ENTRY <- EAX
+0x004 Blink : Ptr32 _LIST_ENTRY
0x1009 mov eax, dword ptr [eax + 0x14]
EAX is pointing to the first entry in the LIST Entry. Dumping the PEB gives a pretty view of what the linked list looks like via the sub-section of Ldr.InMemoryOrderModuleList: 010533f0 . 0105d710
in the following output.
0:000> !peb
PEB at 00df5000
InheritedAddressSpace: No
ReadImageFileExecOptions: No
BeingDebugged: Yes
ImageBaseAddress: 00e10000
Ldr 773d5d80
Ldr.Initialized: Yes
Ldr.InInitializationOrderModuleList: 010532f0 . 010548d8
Ldr.InLoadOrderModuleList: 010533e8 . 0105d708
Ldr.InMemoryOrderModuleList: 010533f0 . 0105d710
Base TimeStamp Module
e10000 6139342d Sep 08 16:07:41 2021 C:\Users\this\Documents\projects\antidis\Debug\antidis.exe ; <- 0x1009 mov eax, dword ptr [eax + 0x14]
772b0000 221456c9 Feb 13 06:44:41 1988 C:\WINDOWS\SYSTEM32\ntdll.dll ; <- 0x100c mov eax, dword ptr [eax + ecx]
75310000 5628abf5 Oct 22 03:27:17 2015 C:\WINDOWS\System32\KERNEL32.DLL ; <- 0x100f mov eax, dword ptr [eax + ecx]
768f0000 270baf18 Oct 04 15:52:24 1990 C:\WINDOWS\System32\KERNELBASE.dll
74b00000 1a613b1c Jan 10 03:49:00 1984 C:\WINDOWS\SYSTEM32\apphelp.dll
66320000 4ba1dbd4 Mar 18 01:52:52 2010 C:\WINDOWS\SYSTEM32\MSVCR100D.dll
SubSystemData: 00000000
ProcessHeap: 01050000
KERNEL32.DLL is the third entry InMemoryOrderModuleList in the above output. The pointer is currently at the first, executing mov eax, dword ptr [eax + ecx]
twice has EAX pointing to the third entry in the Ldr.InMemoryOrderModuleList.
Notes on the different Lists
- InInitializationOrderModuleList: Order DLLs were loaded into memory by the Windows Loader
- InLoadOrderModuleList: Order DLLs exist in memory layout
- InMemoryOrderModuleList: Order DLLs were initilized
mov ebx, dword ptr [eax + 0x10]
EAX is currently pointing to InMemoryOrderLinks at offset 0x8
. The base address is stored at 0x018
0:000> dt _LDR_DATA_TABLE_ENTRY
ntdll!_LDR_DATA_TABLE_ENTRY
+0x000 InLoadOrderLinks : _LIST_ENTRY
+0x008 InMemoryOrderLinks : _LIST_ENTRY <- EAX is pointing to a linked list
+0x010 InInitializationOrderLinks : _LIST_ENTRY
+0x018 DllBase : Ptr32 Void
+0x01c EntryPoint : Ptr32 Void
ntdll!_IMAGE_DOS_HEADER
+0x000 e_magic : Uint2B
+0x002 e_cblp : Uint2B
+0x004 e_cp : Uint2B
+0x006 e_crlc : Uint2B
+0x008 e_cparhdr : Uint2B
+0x00a e_minalloc : Uint2B
+0x00c e_maxalloc : Uint2B
+0x00e e_ss : Uint2B
+0x010 e_sp : Uint2B
+0x012 e_csum : Uint2B
+0x014 e_ip : Uint2B
+0x016 e_cs : Uint2B
+0x018 e_lfarlc : Uint2B
+0x01a e_ovno : Uint2B
+0x01c e_res : [4] Uint2B
+0x024 e_oemid : Uint2B
+0x026 e_oeminfo : Uint2B
+0x028 e_res2 : [10] Uint2B
+0x03c e_lfanew : Int4B
Hex Dump of the header of an executable. The first byte is e_magic
.
Offset(h) 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
00000000 4D 5A 90 00 03 00 00 00 04 00 00 00 FF FF 00 00 MZ..........ÿÿ..
00000010 B8 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00 ¸.......@.......
00000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00000030 00 00 00 00 00 00 00 00 00 00 00 00 E8 00 00 00 ............è... <- E8 is 0x03c e_lfanew
00000040 0E 1F BA 0E 00 B4 09 CD 21 B8 01 4C CD 21 54 68 ..º..´.Í!¸.LÍ!Th
00000050 69 73 20 70 72 6F 67 72 61 6D 20 63 61 6E 6E 6F is program canno
00000060 74 20 62 65 20 72 75 6E 20 69 6E 20 44 4F 53 20 t be run in DOS
00000070 6D 6F 64 65 2E 0D 0D 0A 24 00 00 00 00 00 00 00 mode....$.......
00000080 88 BF 4D C8 CC DE 23 9B CC DE 23 9B CC DE 23 9B ˆ¿MÈÌÞ#›ÌÞ#›ÌÞ#›
00000090 A3 A8 BF 9B CF DE 23 9B A3 A8 88 9B CB DE 23 9B £¨¿›ÏÞ#›£¨ˆ›ËÞ#›
000000A0 C5 A6 B0 9B CE DE 23 9B CC DE 22 9B FB DE 23 9B Ŧ°›ÎÞ#›ÌÞ"›ûÞ#›
000000B0 A3 A8 89 9B DF DE 23 9B A3 A8 B9 9B CD DE 23 9B £¨‰›ßÞ#›£¨¹›ÍÞ#›
000000C0 A3 A8 BE 9B CD DE 23 9B 52 69 63 68 CC DE 23 9B £¨¾›ÍÞ#›RichÌÞ#›
000000D0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
000000E0 00 00 00 00 00 00 00 00 50 45 00 00 4C 01 07 00 ........PE..L...
Windbg did not contain the structure
Private Type IMAGE_EXPORT_DIRECTORY
Characteristics As Long
TimeDateStamp As Long
MajorVersion As Integer
MinorVersion As Integer
lpName As Long
Base As Long
NumberOfFunctions As Long
NumberOfNames As Long
lpAddressOfFunctions As Long
lpAddressOfNames As Long
lpAddressOfNameOrdinals As Long
End Type
- http://terminus.rewolf.pl/terminus/
- https://www2.cs.arizona.edu/~debray/Publications/disasm.pdf
- https://docs.google.com/presentation/d/17Vlv5JD8fGeeNMQqDuwDQXN3d9U6Yxmfb1aebfbMM98/view#slide=id.g586bbaeb3c_0_130
- https://www.ired.team/offensive-security/code-injection-process-injection/finding-kernel32-base-and-function-addresses-in-shellcode
- https://xen0vas.github.io/Win32-Reverse-Shell-Shellcode-part-2-Locate-the-Export-Directory-Table/#