$ nasm -f elf -o script.o script.asm
$ ld -m elf_i386 -o script script.o
-elf
is used in nasm
to produce a 32 bits object file (otherwise it outputs a binary file by default which is difficult to link).
$ nasm -f elf64 -o script.o script.asm
$ ld -o script script.o
No -m
option in ld
because the output executable is 64 bits by default (same as the input object file).
The same registers as for a 16 bits architecture, just add E
for 32 bits ones: EAX, EBX, ECX...
.
- Accumulator: EAX.
- Base: EBX.
- Counter: ECX.
- Data: EDX.
- Stack pointer: ESP.
- Stack Base Pointer: EBP.
- Source: ESI.
- Destination: EDI.
- Segment registers: SS (stack), CS (Code), DS (data).
- EFLAGS: Status register (carry, partiy, zero, sign flags).
For further explanations see this page.
- Immediate:
MOV AX, 1Ch
, with1C
in hex. - Register:
MOV AX, BX
. - Direct:
MOV AX, [my_var]
. - Direct offset:
MOV AX, my_var[2]
. - Indirect:
MOV AX, [DI]
.
- Single byte:
<var> db 0x55
. - Successive bytes:
<var> db 0x55, 0x56, 0x57
. - String:
<var> dw 'abc'
.
- To generate assembly code from a C script (the assembly code will be in AT&T syntax):
$ gcc -S script.c
- To output assembly code in Intel syntax (compatible with NASM):
$ gcc -masm=intel -S script.c
- GCC uses GAS (command
as
) to assemble the assembly code into object files (the same wayNASM
does).
Notes below were summarized from The holy book of X86. For more details, see the Intel Software Developer Manuals.
- Byte (8 bits).
- Word (16 bits).
- DWORD (32 bits).
- QuadWord (64 bits).
- Represented using two-complement notation.
- Flip all bits, then add 1 (both to calculate the two-complement and the negative number from the two-complement).
- The sign is the most significant bit (0: positive number, 1: negative number).
- On one byte:
- [ 0x00 - 0x7f ]: [ 0, +127 ]
- [ 0x80 - 0xff ]: [ -128, -1 ]
- Intel x86 architecture is little-endian.
- Little endianness means values are stored in memory with lowest byte coming first (on RAM only not on registers).
- 8 general purpose 32-bit registers: EAX, EBX, ECX, EDX, ESI, EDI, EBP, ESP.
- 16 general purpose 64-bit registers: RAX; RBX, RCX, RDX, RSI, RDI, RBP, RSP, R8-R15.
- EIP (RIP for 64-bit): Instruction Pointer.
- EFLAGS:
- 32-bit register where each bit is a flag.
- Used to indicate status of computations.
- Used also to control certain operations.
General purpose registers commonly used as follows:
- EAX (RAX): used when a function returns a value.
- EBX (RBX): Base pointer for data section.
- EDX (RDX): I/O pointer.
- ECX (RCX): Loop counter.
- ESI (RSI) & EDI (RDI): Source and destination index resp. (e.g. copying a string).
- ESP (RSP): Stack Pointer points to the top of the stack.
- EBP (RBP): Base Pointer points to the bottom of the stack.
Most important flags are:
- Zero Flag (ZF): Set to 1 if compared values are equal.
- Sign Flag (SF): Determines if dealing with a signe or unsigned integer.
- Carry Flag (CF): Overflowing operation.
Both ZF and SF are set by CMP as a subtraction is performed (See conditional jumps below).
- Holds a function's local variables.
- Grows down from higher to lower memory addresses.
- Every process has a private address space (on 32-bit architectures: [ 0x00000000, 0x7fffffff ]).
- No overlap between processes addresses thanks to paging.
- Paging maps process' virtual addresses to physical memory.
- Area where functions store their local variables.
- Each function has its own stack frame.
- Function prologue: run before function call.
PUSH EBP ; Save previous stack frame
MOV EBP, ESP ; Create its own stack frame
- Function epilogue: run after function call
POP EBP ; Reset EBP to previous stack frame
RET ; Pop top of stack & assign it to EIP (usually address after CALL)
According to the instruction:
- Register:
XOR EAX, EAX ; zeroes register eax
- Immediate:
MOV EAX, 4 ; 4 is in decimal
- Memory:
MOV EAX, [0x12345678]
- Register indirect:
MOV EAX [EBX] ; Assign value who address is in EBX
- Register indirect with displacement:
MOV EAX [EBX + 4]
Does nothing.
Swap its operands.
Push onto the stack & make register ESP point at new value.
Pop value pointed at by ESP to register operand & point ESP to value before in the stack.
Pop top of the stack & put it in register EIP.
Calls a function. The most commonly used convention in C and some C++ is CDECL, and works as follows:
- Set up a new stack frame.
- Function parameters pushed onto the stack from right to left.
- After CALL:
- Push address of next instruction after CALL onto stack.
- Change EIP with address of first line of function code section.
- Function return value put in EAX for primitive data types.
Copies from source to destination, either:
- Register to register.
- Memory to register or vice versa.
- Immediate to register or to memory.
Addition and subtraction
ADD EAX, 0x10 ; EAX += 0x10
SUB EAX, 0x10 ; EAX -= 0x10
Shift bits to left/right.
Logical operations.
Load effective address (not a register indirect operand).
LEA EAX, [EBP + 8] ; Take address in EBP, add 8 to it & put the resulting address in EAX
Compares source and destination and set the appropriate bit in the EFLAGS register (e.g. Zero flag set to one if two operands are equal).
Changes the value of EIP with given address (Unconditional jump).
Jumps if Zero Flag (ZF) not set.
Jumps if less than or equal.
Signed multiply instruction.
Unsigned division.
Increment/Decrement operand by 1.
- .TEXT: where all the code goes (called also code section).
- .DATA: contains definitions of constants.
- .BSS: where all variables reside.
Process of handing execution to kernel.
INT 0x80 ; 32-bit
SYSCALL ; 64-bit
See the example at the very end of The holy book of X86.
They are declared as follows in NASM:
message db "Hello",0 ; define bytes (db) for a string with a null terminator
length equ $ - message ; equ declares the string length as a constant ($: current @ - message: @ of string)