Skip to content

Instantly share code, notes, and snippets.

@h4k1m0u
Last active May 20, 2024 17:57
Show Gist options
  • Save h4k1m0u/c9c754c749695d833c49c1617a5bb817 to your computer and use it in GitHub Desktop.
Save h4k1m0u/c9c754c749695d833c49c1617a5bb817 to your computer and use it in GitHub Desktop.
Notes about Assembly language with NASM

Run with NASM on Linux

32 bits

$ nasm -f elf -o script.o script.asm
$ ld -m elf_i386 -o script script.o

-elf is used in nasm to produce a 32 bits object file (otherwise it outputs a binary file by default which is difficult to link).

64 bits

$ nasm -f elf64 -o script.o script.asm
$ ld -o script script.o

No -m option in ld because the output executable is 64 bits by default (same as the input object file).

Registers

The same registers as for a 16 bits architecture, just add E for 32 bits ones: EAX, EBX, ECX....

  • Accumulator: EAX.
  • Base: EBX.
  • Counter: ECX.
  • Data: EDX.
  • Stack pointer: ESP.
  • Stack Base Pointer: EBP.
  • Source: ESI.
  • Destination: EDI.
  • Segment registers: SS (stack), CS (Code), DS (data).
  • EFLAGS: Status register (carry, partiy, zero, sign flags).

For further explanations see this page.

Addressing modes

  • Immediate: MOV AX, 1Ch, with 1C in hex.
  • Register: MOV AX, BX.
  • Direct: MOV AX, [my_var].
  • Direct offset: MOV AX, my_var[2].
  • Indirect: MOV AX, [DI].

Declaring data

  • Single byte: <var> db 0x55.
  • Successive bytes: <var> db 0x55, 0x56, 0x57.
  • String: <var> dw 'abc'.

GCC

  • To generate assembly code from a C script (the assembly code will be in AT&T syntax):
$ gcc -S script.c
  • To output assembly code in Intel syntax (compatible with NASM):
$ gcc -masm=intel -S script.c
  • GCC uses GAS (command as) to assemble the assembly code into object files (the same way NASM does).

X86

Notes below were summarized from The holy book of X86. For more details, see the Intel Software Developer Manuals.

Assembly data types

  • Byte (8 bits).
  • Word (16 bits).
  • DWORD (32 bits).
  • QuadWord (64 bits).

Negative numbers

  • Represented using two-complement notation.
  • Flip all bits, then add 1 (both to calculate the two-complement and the negative number from the two-complement).
  • The sign is the most significant bit (0: positive number, 1: negative number).
  • On one byte:
    • [ 0x00 - 0x7f ]: [ 0, +127 ]
    • [ 0x80 - 0xff ]: [ -128, -1 ]

Endianness

  • Intel x86 architecture is little-endian.
  • Little endianness means values are stored in memory with lowest byte coming first (on RAM only not on registers).

Registers

  • 8 general purpose 32-bit registers: EAX, EBX, ECX, EDX, ESI, EDI, EBP, ESP.
  • 16 general purpose 64-bit registers: RAX; RBX, RCX, RDX, RSI, RDI, RBP, RSP, R8-R15.
  • EIP (RIP for 64-bit): Instruction Pointer.
  • EFLAGS:
    • 32-bit register where each bit is a flag.
    • Used to indicate status of computations.
    • Used also to control certain operations.

General purpose registers commonly used as follows:

  • EAX (RAX): used when a function returns a value.
  • EBX (RBX): Base pointer for data section.
  • EDX (RDX): I/O pointer.
  • ECX (RCX): Loop counter.
  • ESI (RSI) & EDI (RDI): Source and destination index resp. (e.g. copying a string).
  • ESP (RSP): Stack Pointer points to the top of the stack.
  • EBP (RBP): Base Pointer points to the bottom of the stack.

EFLAGS

Most important flags are:

  • Zero Flag (ZF): Set to 1 if compared values are equal.
  • Sign Flag (SF): Determines if dealing with a signe or unsigned integer.
  • Carry Flag (CF): Overflowing operation.

Both ZF and SF are set by CMP as a subtraction is performed (See conditional jumps below).

Stack

  • Holds a function's local variables.
  • Grows down from higher to lower memory addresses.

Process addresses

  • Every process has a private address space (on 32-bit architectures: [ 0x00000000, 0x7fffffff ]).
  • No overlap between processes addresses thanks to paging.
  • Paging maps process' virtual addresses to physical memory.

Stack frame

  • Area where functions store their local variables.
  • Each function has its own stack frame.

Function calls

  • Function prologue: run before function call.
PUSH EBP ; Save previous stack frame
MOV EBP, ESP ; Create its own stack frame
  • Function epilogue: run after function call
POP EBP ; Reset EBP to previous stack frame
RET ; Pop top of stack & assign it to EIP (usually address after CALL)

Operand types

According to the instruction:

  • Register: XOR EAX, EAX ; zeroes register eax
  • Immediate: MOV EAX, 4 ; 4 is in decimal
  • Memory: MOV EAX, [0x12345678]
  • Register indirect: MOV EAX [EBX] ; Assign value who address is in EBX
  • Register indirect with displacement: MOV EAX [EBX + 4]

Instruction set

NOP

Does nothing.

XCHG

Swap its operands.

PUSH

Push onto the stack & make register ESP point at new value.

POP

Pop value pointed at by ESP to register operand & point ESP to value before in the stack.

RET

Pop top of the stack & put it in register EIP.

CALL

Calls a function. The most commonly used convention in C and some C++ is CDECL, and works as follows:

  • Set up a new stack frame.
  • Function parameters pushed onto the stack from right to left.
  • After CALL:
    • Push address of next instruction after CALL onto stack.
    • Change EIP with address of first line of function code section.
  • Function return value put in EAX for primitive data types.

MOV

Copies from source to destination, either:

  • Register to register.
  • Memory to register or vice versa.
  • Immediate to register or to memory.

ADD/SUB

Addition and subtraction

ADD EAX, 0x10 ; EAX += 0x10 
SUB EAX, 0x10 ; EAX -= 0x10

SHL/SHR

Shift bits to left/right.

AND/OR/XOR

Logical operations.

LEA

Load effective address (not a register indirect operand).

LEA EAX, [EBP + 8] ; Take address in EBP, add 8 to it & put the resulting address in EAX

CMP

Compares source and destination and set the appropriate bit in the EFLAGS register (e.g. Zero flag set to one if two operands are equal).

JMP

Changes the value of EIP with given address (Unconditional jump).

JNE

Jumps if Zero Flag (ZF) not set.

JLE

Jumps if less than or equal.

IMUL

Signed multiply instruction.

DIV

Unsigned division.

INC/DEC

Increment/Decrement operand by 1.

Program sections

  • .TEXT: where all the code goes (called also code section).
  • .DATA: contains definitions of constants.
  • .BSS: where all variables reside.

Syscall

Process of handing execution to kernel.

INT 0x80 ; 32-bit
SYSCALL  ; 64-bit

See the example at the very end of The holy book of X86.

Constants

They are declared as follows in NASM:

message db "Hello",0 ; define bytes (db) for a string with a null terminator
length equ $ - message ; equ declares the string length as a constant ($: current @ - message: @ of string)
; Write message to stdout with sys_write system call
; http://asmtutor.com/#lesson1
;
; How to run:
; 1. Compile: nasm -f elf64 helloworld.asm
; 2. Linking: ld helloworld.o -m helloworld
global _start
SECTION .data:
msg db 'Hello World!', 0Ah ; 0xA is ASCII code for newline
SECTION .text:
_start:
; sys_write msg variable to stdout
mov edx, 13
mov ecx, msg
mov eax, 4
int 80h
; sys_exit (exit properly & avoid segmentation fault)
mov ebx, 0
mov eax, 1
int 80h
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment