Skip to content

Instantly share code, notes, and snippets.

@cojocar
Last active October 16, 2021 19:28
Show Gist options
  • Save cojocar/c794b7c8ddc76decc6e27be31f2ac46b to your computer and use it in GitHub Desktop.
Save cojocar/c794b7c8ddc76decc6e27be31f2ac46b to your computer and use it in GitHub Desktop.
Different behavior of objdump depending on data/code ARM specific symbols

Symbols are important for linear disassembly on ARM

For x86, because we don't have any "special" symbols, the output is the same for strip vs nostrip. ".word 0x9090" which is data is interpreted as code because of the linear disassambly behavior. The special symbols are documented here: https://sourceware.org/binutils/docs-2.23/as/ARM-Mapping-Symbols.html

For ARM objdump (and the tools alike), because symbols are used to determine if we have code ore data the linear disass knows where is data and where is code.

$ cat a.c
int
main(void)
{
        asm(".word 0x9090");
        return 0;
}
$ gcc -c -o a.x86.nostrip.o a.c
```

# x86 unstripped
```
$ readdelf -s a.x86.nostrip.o 
Symbol table '.symtab' contains 9 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND 
     1: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS a.c
     2: 0000000000000000     0 SECTION LOCAL  DEFAULT    1 
     3: 0000000000000000     0 SECTION LOCAL  DEFAULT    2 
     4: 0000000000000000     0 SECTION LOCAL  DEFAULT    3 
     5: 0000000000000000     0 SECTION LOCAL  DEFAULT    5 
     6: 0000000000000000     0 SECTION LOCAL  DEFAULT    6 
     7: 0000000000000000     0 SECTION LOCAL  DEFAULT    4 
     8: 0000000000000000    13 FUNC    GLOBAL DEFAULT    1 main
$ objdump -dS a.x86.nostrip.o 
0000000000000000 <main>:
   0:   55                      push   %rbp
   1:   48 89 e5                mov    %rsp,%rbp
   4:   90                      nop
   5:   90                      nop
   6:   b8 00 00 00 00          mov    $0x0,%eax
   b:   5d                      pop    %rbp
   c:   c3                      retq   
```

# x86 stripped (no symbols)

```
$ cp a.x86.nostrip.o a.x86.strip.o
$ strip a.x86.strip.o
$ readelf -s a.x86.strip.o 
$ objdump -dS a.x86.strip.o 
0000000000000000 <.text>:
   0:   55                      push   %rbp
   1:   48 89 e5                mov    %rsp,%rbp
   4:   90                      nop
   5:   90                      nop
   6:   b8 00 00 00 00          mov    $0x0,%eax
   b:   5d                      pop    %rbp
   c:   c3                      retq
```

# ARM unstripped

```
$ arm-none-eabi-gcc -c -o a.arm.nostrip.o  a.c
$ readelf -s a.arm.nostrip.o 
Symbol table '.symtab' contains 11 entries:
   Num:    Value  Size Type    Bind   Vis      Ndx Name
     0: 00000000     0 NOTYPE  LOCAL  DEFAULT  UND 
     1: 00000000     0 FILE    LOCAL  DEFAULT  ABS a.c
     2: 00000000     0 SECTION LOCAL  DEFAULT    1 
     3: 00000000     0 SECTION LOCAL  DEFAULT    3 
     4: 00000000     0 SECTION LOCAL  DEFAULT    4 
     5: 00000000     0 NOTYPE  LOCAL  DEFAULT    1 $a
     6: 00000008     0 NOTYPE  LOCAL  DEFAULT    1 $d
        ^^^^^^^^^^^^^^ this symbols say we have data at 0x8
     7: 0000000c     0 NOTYPE  LOCAL  DEFAULT    1 $a
     8: 00000000     0 SECTION LOCAL  DEFAULT    5 
     9: 00000000     0 SECTION LOCAL  DEFAULT    6 
    10: 00000000    32 FUNC    GLOBAL DEFAULT    1 main

$ objdump -dS a.arm.nostrip.o 
00000000 <main>:
   0:   e52db004        push    {fp}            ; (str fp, [sp, #-4]!)
   4:   e28db000        add     fp, sp, #0
   8:   00009090        .word   0x00009090
        ^^^^^^^^^^^^^^^^^^^^^^ indeed the bytes are read as data 
   c:   e3a03000        mov     r3, #0
  10:   e1a00003        mov     r0, r3
  14:   e24bd000        sub     sp, fp, #0
  18:   e49db004        pop     {fp}            ; (ldr fp, [sp], #4)
  1c:   e12fff1e        bx      lr
```

# ARM stripped (no symbols)

```
$ cp a.arm.nostrip.o a.arm.strip.o
$ strip a.arm.strip.o
$ readelf -s a.arm.strip.o
$ objdump -dS a.arm.strip.o 
00000000 <.text>:
   0:   e52db004        push    {fp}            ; (str fp, [sp, #-4]!)
   4:   e28db000        add     fp, sp, #0
   8:   00009090        muleq   r0, r0, r0
        ^^^^^^^^^^^^^^^^^^^^^^ bytes are read as code
   c:   e3a03000        mov     r3, #0
  10:   e1a00003        mov     r0, r3
  14:   e24bd000        sub     sp, fp, #0
  18:   e49db004        pop     {fp}            ; (ldr fp, [sp], #4)
  1c:   e12fff1e        bx      lr
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment