Skip to content

Instantly share code, notes, and snippets.

@decagondev
Last active July 25, 2025 22:03
Show Gist options
  • Save decagondev/f390c33d3eeecd57761af9f16d71a32d to your computer and use it in GitHub Desktop.
Save decagondev/f390c33d3eeecd57761af9f16d71a32d to your computer and use it in GitHub Desktop.

CPU Registers: The CPU's Working Memory

What Are Registers?

Registers are the fastest storage locations in a computer, located directly within the CPU. Think of them as the CPU's immediate workspace - like having tools and materials right on your workbench rather than having to walk to a storage room every time you need something.

Key Characteristics:

  • Ultra-fast access: No memory latency like RAM
  • Limited quantity: Only a small number available
  • Temporary storage: Hold data during active processing
  • Direct CPU access: No bus communication required
graph TD
    A[CPU] --> B[Registers<br/>Fastest Access<br/>~1 cycle]
    A --> C[L1 Cache<br/>Very Fast<br/>~3-4 cycles]
    A --> D[L2 Cache<br/>Fast<br/>~10-20 cycles]
    A --> E[RAM<br/>Slow<br/>~100-300 cycles]
    A --> F[Storage<br/>Very Slow<br/>Millions of cycles]
    
    style B fill:#ff9999,color:#000
    style C fill:#ffcc99,color:#000
    style D fill:#ffff99,color:#000
    style E fill:#ccffcc,color:#000
    style F fill:#ccccff,color:#000
Loading

x64 Architecture Overview

The x64 (also called x86-64 or AMD64) architecture extends the original 32-bit x86 design to handle 64-bit data. This means:

  • 64-bit registers: Can hold larger numbers and memory addresses
  • More registers: Additional R8-R15 registers added
  • Backward compatibility: Can still run 32-bit and 16-bit code
  • Larger address space: Can access more than 4GB of memory
timeline
    title Evolution of x86 Architecture
    
    1978 : 8086 (16-bit)
         : 8 registers
         : 1MB memory limit
    
    1985 : 80386 (32-bit)
         : Extended to 32-bit
         : 4GB memory limit
    
    2003 : x64 (64-bit)
         : 64-bit extensions
         : 16+ registers
         : Massive memory space
Loading

General Purpose Registers

These registers can store any type of data and are used for most operations. Each has traditional uses but can be repurposed as needed.

graph LR
    subgraph "General Purpose Registers (64-bit)"
        RAX["RAX<br/>Accumulator<br/>Return values<br/>Arithmetic"]
        RBX["RBX<br/>Base<br/>General purpose<br/>Preserved"]
        RCX["RCX<br/>Counter<br/>Loop counters<br/>1st parameter"]
        RDX["RDX<br/>Data<br/>I/O operations<br/>2nd parameter"]
        RSI["RSI<br/>Source Index<br/>String operations<br/>Source pointer"]
        RDI["RDI<br/>Destination Index<br/>String operations<br/>Dest pointer"]
        RBP["RBP<br/>Base Pointer<br/>Stack frame base"]
        RSP["RSP<br/>Stack Pointer<br/>Stack top"]
    end
    
    subgraph "Additional x64 Registers"
        R8["R8<br/>General purpose"]
        R9["R9<br/>General purpose"]
        R10["R10<br/>General purpose"]
        R11["R11<br/>General purpose"]
        R12["R12<br/>General purpose"]
        R13["R13<br/>General purpose"]
        R14["R14<br/>General purpose"]
        R15["R15<br/>General purpose"]
    end
    
    style RAX fill:#ffcccc,color:#000
    style RCX fill:#ccffcc,color:#000
    style RDX fill:#ccccff,color:#000
    style RSP fill:#ffffcc,color:#000
    style RBP fill:#ffccff,color:#000
Loading

Detailed Register Functions

RAX (Accumulator)

  • Primary register for arithmetic operations
  • Stores function return values
  • Used in multiplication and division operations
  • Often the target of calculations

RBX (Base)

  • General-purpose storage
  • Preserved across function calls (callee-saved)
  • Often used as a base pointer for data structures
  • Good for holding important values that need to persist

RCX (Counter)

  • Traditional loop counter register
  • First parameter in function calls (Windows calling convention)
  • Used in string operations for repetition counts
  • Automatically decremented in loop instructions

RDX (Data)

  • I/O port operations
  • Second parameter in function calls
  • High-order bits in multiplication/division
  • General data manipulation

RSI/RDI (Source/Destination Index)

  • Pointer registers for string and array operations
  • RSI points to source data
  • RDI points to destination
  • Automatically incremented/decremented in string operations

RBP/RSP (Stack Pointers)

  • RSP: Always points to the current top of the stack
  • RBP: Points to the base of the current stack frame
  • Critical for function calls and local variable access

Special Purpose Registers

These registers have specific, dedicated functions that cannot be changed.

graph TD
    subgraph "Special Purpose Registers"
        RIP["RIP<br/>Instruction Pointer<br/>Next instruction address"]
        RFLAGS["RFLAGS<br/>Status Flags<br/>Condition codes"]
    end
    
    subgraph "RFLAGS Bits (Key Flags)"
        CF["CF - Carry Flag<br/>Arithmetic overflow"]
        ZF["ZF - Zero Flag<br/>Result was zero"]
        SF["SF - Sign Flag<br/>Result was negative"]
        OF["OF - Overflow Flag<br/>Signed overflow"]
        PF["PF - Parity Flag<br/>Even number of 1s"]
        AF["AF - Auxiliary Flag<br/>BCD arithmetic"]
    end
    
    RFLAGS --> CF
    RFLAGS --> ZF
    RFLAGS --> SF
    RFLAGS --> OF
    RFLAGS --> PF
    RFLAGS --> AF
    
    style RIP fill:#ff9999,color:#000
    style RFLAGS fill:#99ff99,color:#000
Loading

RIP (Instruction Pointer)

  • Always contains the address of the next instruction to execute
  • Automatically updated by the CPU
  • Modified by jump, call, and return instructions
  • Cannot be directly written to by most instructions

RFLAGS (Status Flags)

Contains individual bits that indicate the results of operations:

  • Zero Flag (ZF): Set when an operation result is zero
  • Carry Flag (CF): Set when arithmetic produces a carry/borrow
  • Sign Flag (SF): Set when result is negative
  • Overflow Flag (OF): Set when signed arithmetic overflows

Register Naming Convention

The x64 architecture maintains backward compatibility through a hierarchical naming system:

graph TD
    subgraph "Register RAX - 64-bit Layout"
        A["Bit 63"] --> B["..."] --> C["Bit 32"] --> D["Bit 31"] --> E["..."] --> F["Bit 16"] --> G["Bit 15"] --> H["..."] --> I["Bit 8"] --> J["Bit 7"] --> K["..."] --> L["Bit 0"]
    end
    
    subgraph "Naming Convention"
        RAX64["RAX (64-bit)<br/>Full register"]
        EAX32["EAX (32-bit)<br/>Lower 32 bits"]
        AX16["AX (16-bit)<br/>Lower 16 bits"]
        AH8["AH (8-bit)<br/>Bits 15-8"]
        AL8["AL (8-bit)<br/>Bits 7-0"]
    end
    
    RAX64 -.-> A
    RAX64 -.-> L
    EAX32 -.-> D
    EAX32 -.-> L
    AX16 -.-> F
    AX16 -.-> L
    AH8 -.-> F
    AH8 -.-> I
    AL8 -.-> J
    AL8 -.-> L
    
    style RAX64 fill:#ffcccc,color:#000
    style EAX32 fill:#ccffcc,color:#000
    style AX16 fill:#ccccff,color:#000
    style AH8 fill:#ffffcc,color:#000
    style AL8 fill:#ffccff,color:#000
Loading

Size Examples

graph LR
    subgraph "All Registers Follow Same Pattern"
        subgraph "RAX Family"
            RAX_64["RAX (64-bit)"]
            EAX_32["EAX (32-bit)"]
            AX_16["AX (16-bit)"]
            AL_8["AL (8-bit)"]
            AH_8["AH (8-bit)"]
        end
        
        subgraph "RBX Family"
            RBX_64["RBX (64-bit)"]
            EBX_32["EBX (32-bit)"]
            BX_16["BX (16-bit)"]
            BL_8["BL (8-bit)"]
            BH_8["BH (8-bit)"]
        end
        
        subgraph "New x64 Registers"
            R8_64["R8 (64-bit)"]
            R8D_32["R8D (32-bit)"]
            R8W_16["R8W (16-bit)"]
            R8B_8["R8B (8-bit)"]
        end
    end
Loading

Important Notes:

  • Writing to a 32-bit register (like EAX) automatically zeros the upper 32 bits
  • 16-bit and 8-bit operations preserve the remaining bits
  • New registers (R8-R15) use D/W/B suffixes instead of separate names

Register Usage in Practice

Here's how registers work together in common scenarios:

sequenceDiagram
    participant CPU
    participant RAX
    participant RBX
    participant RCX
    participant Memory
    
    Note over CPU: Function Call Example
    CPU->>RCX: Load first parameter
    CPU->>RDX: Load second parameter  
    CPU->>Memory: Call function
    Note over CPU: Inside function
    CPU->>RAX: Perform calculation
    CPU->>RBX: Store intermediate result
    CPU->>RAX: Prepare return value
    CPU->>Memory: Return to caller
    Note over CPU: Back in caller
    CPU->>RAX: Use return value
Loading

Function Call Convention (Windows x64)

  1. RCX: First integer parameter
  2. RDX: Second integer parameter
  3. R8: Third integer parameter
  4. R9: Fourth integer parameter
  5. RAX: Return value
  6. RSP: Stack pointer (maintained)
  7. RBP: Frame pointer (optional)

Loop Example

flowchart TD
    A[Load counter into RCX] --> B[Load array address into RSI]
    B --> C[Clear RAX accumulator]
    C --> D{RCX > 0?}
    D -->|Yes| E[Load value from RSI into RDX]
    E --> F[Add RDX to RAX]
    F --> G[Increment RSI pointer]
    G --> H[Decrement RCX counter]
    H --> D
    D -->|No| I[RAX contains sum]
    
    style A fill:#ffcccc,color:#000
    style C fill:#ffcccc,color:#000
    style E fill:#ccffcc,color:#000
    style F fill:#ffcccc,color:#000
    style G fill:#ccccff,color:#000
    style H fill:#ccffcc,color:#000
Loading

FASM Implementation:

format PE64 console
include 'win64a.inc'
entry start

section '.data' data readable writeable
    array       dq 10, 20, 30, 40, 50
    array_size  = ($ - array) / 8
    result      dq 0

section '.code' code readable executable
start:
    mov rcx, array_size
    mov rsi, array
    xor rax, rax
    
loop_start:
    test rcx, rcx
    jz loop_done

    mov rdx, [rsi]
    add rax, rdx
    add rsi, 8
    dec rcx
    jmp loop_start
    
loop_done:
    mov [result], rax
    mov rcx, 0
    call [ExitProcess]

section '.idata' import data readable writeable
    library kernel32,'KERNEL32.DLL'
    import kernel32, ExitProcess,'ExitProcess'

Performance Considerations

Understanding register usage is crucial for performance:

graph TD
    subgraph "Performance Hierarchy"
        A[Register Access<br/>~1 CPU cycle<br/>Fastest]
        B[L1 Cache<br/>~3-4 cycles<br/>Very Fast]
        C[L2 Cache<br/>~10-20 cycles<br/>Fast]
        D[L3 Cache<br/>~40-75 cycles<br/>Moderate]
        E[RAM<br/>~100-300 cycles<br/>Slow]
        F[SSD Storage<br/>~25,000+ cycles<br/>Very Slow]
        G[Hard Drive<br/>~2,000,000+ cycles<br/>Extremely Slow]
        
        A --> B --> C --> D --> E --> F --> G
    end
    
    style A fill:#00ff00,color:#000
    style B fill:#66ff66,color:#000
    style C fill:#99ff99,color:#000
    style D fill:#ccffcc,color:#000
    style E fill:#ffff99,color:#000
    style F fill:#ffcc66,color:#000
    style G fill:#ff6666,color:#000
Loading

Best Practices

  1. Keep frequently used data in registers: Avoid repeated memory access
  2. Understand calling conventions: Know which registers are preserved
  3. Use appropriate register sizes: Don't waste 64-bit registers for small values
  4. Consider register pressure: Too many variables can force memory spills

Common Assembly Patterns

Here are typical patterns you'll see in assembly code:

graph LR
    subgraph "Data Movement"
        A["MOV RAX, RBX<br/>Copy RBX to RAX"]
        B["MOV RAX, [RSI]<br/>Load from memory"]
        C["MOV [RDI], RAX<br/>Store to memory"]
    end
    
    subgraph "Arithmetic"
        D["ADD RAX, RBX<br/>RAX = RAX + RBX"]
        E["SUB RAX, 10<br/>RAX = RAX - 10"]
        F["MUL RBX<br/>RAX = RAX * RBX"]
    end
    
    subgraph "Control Flow"
        G["CMP RAX, RBX<br/>Compare values"]
        H["JZ label<br/>Jump if zero"]
        I["CALL function<br/>Call subroutine"]
    end
Loading

Summary

CPU registers are the foundation of all computer operations. They provide:

  • Speed: Fastest storage available to the CPU
  • Versatility: Can hold data, addresses, or control information
  • Efficiency: Reduce memory traffic and improve performance
  • Control: Enable precise manipulation of program execution

Understanding registers helps you:

  • Write more efficient code
  • Debug low-level problems
  • Understand performance bottlenecks
  • Optimize critical algorithms

The x64 architecture's 16+ registers provide ample workspace for modern applications while maintaining compatibility with older software. Whether you're writing assembly code, optimizing C/C++, or just curious about how computers work, registers are essential knowledge for any serious programmer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment