These notes explain how a linker works and how to write a linker script. It is very basic, but the goal is to illuminate how it works in the simplest possible way.
The linker for most Linux systems is ld, which has a default linker script. To see the default linker script:
ld --verbose
To specify a custom linker script:
ld -T /path/to/custom/script.ld ...
Further reading:
- https://ftp.gnu.org/old-gnu/Manuals/ld-2.9.1/html_chapter/ld_3.html (manual)
- https://home.cs.colorado.edu/~main/cs1300/doc/gnu/ld_3.html
Create foo.asm:
global foo
section .text
foo:
mov rax, 3
ret
Assemble it:
nasm -w+all -f elf64 -o "foo.o" "foo.asm"
Check out the hex dump:
hd foo.o
Look at the ELF header, and the section headers:
readelf -h foo.o
readelf -S foo.o
There should be no program headers in this file, since it's just an object file:
readelf -l foo.o
Check the assembly:
objdump -d foo.o
Notice that the function foo has a dummy/empty-placeholder address (0000...), and each instruction has an address that is just an offset from 0000 (e.g., the first instruction is 0, the next is 5, because the first instruction is 5 bytes long, and that is where this next instruction starts):
foo.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <foo>:
0: b8 03 00 00 00 mov $0x3,%eax
5: c3 retq
Now make a program that calls the foo function. Create a file called main.asm:
extern foo
global _start
section .text
_start:
call foo ; call foo - result will be in rax
mov rdi, rax ; put rax into rdi
mov rax, 60 ; code for sys_exit - will exit with number in rdi
syscall ; call sys_exit
Compile it:
nasm -w+all -f elf64 -o "main.o" "main.asm"
Check out the hex dump:
hd main.o
Look at the ELF header, and the section headers:
readelf -h main.o
readelf -S main.o
There should be no program headers in this file, since it's just an object file:
readelf -l main.o
Check the assembly:
objdump -d main.o
Now, link the files into one executable:
ld -o "main.elf" "foo.o" "main.o"
Execute it and check that the exit code is 3:
./main.elf
echo $? # should be 3
Check out the hex dump:
hd main.elf
Look at the ELF header, the section headers, and the program headers:
readelf -h main.elf
readelf -S main.elf
readelf -l main.elf
Check the assembly:
objdump -d main.elf
Notice that foo is now in the file, and that _start calls it directly. So the code from the object files have been put together into this one new file, and the dummy/empty-placeholder addresses have been filled in to make everything connect together.
In a new folder, create two files, foo.asm:
global foo
section .text
foo:
mov rax, 3
ret
and main.asm:
extern foo
global main
section .text
main:
call foo
mov rdi, rax
mov rax, 60
syscall
Create a Makefile too:
all: clean
nasm -w+all -f elf64 -o foo.o foo.asm
nasm -w+all -f elf64 -o main.o main.asm
ld -o main.elf foo.o main.o
.PHONY: clean
clean:
rm -rf *.o
rm -rf main.elf
Notice that main.asm file does not have a _start function. By default, ld will look for a _start function as the entry point. Try to link this:
make
It returns a warning that it couldn't find _start and it's going to start at 401000 instead, which is the beginning of foo. We want to tell the linker that main is the entry point.
Create a file called custom.ld, with these contents:
ENTRY (main)
Now, change the Makefile to this:
all: clean
nasm -w+all -f elf64 -o foo.o foo.asm
nasm -w+all -f elf64 -o main.o main.asm
ld -T custom.ld -o main.elf foo.o main.o
.PHONY: clean
clean:
rm -rf *.o
rm -rf main.elf
That tells ld to use the linker script custom.ld. Now build again:
make
Check the entry point address by looking at the elf header:
readelf -h main.elf
For me, it says the entry point is 0x10:
ELF Header:
Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
Class: ELF64
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: Advanced Micro Devices X86-64
Version: 0x1
Entry point address: 0x10
Start of program headers: 64 (bytes into file)
Start of section headers: 4336 (bytes into file)
Flags: 0x0
Size of this header: 64 (bytes)
Size of program headers: 56 (bytes)
Number of program headers: 1
Size of section headers: 64 (bytes)
Number of section headers: 5
Section header string table index: 4
Look at the assembly to see that address 0x10 is main:
objdump -d main.elf
Indeed it is:
main.elf: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <foo>:
0: b8 03 00 00 00 mov $0x3,%eax
5: c3 retq
6: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
d: 00 00 00
0000000000000010 <main>:
10: e8 eb ff ff ff callq 0 <foo>
15: 48 89 c7 mov %rax,%rdi
18: b8 3c 00 00 00 mov $0x3c,%eax
1d: 0f 05 syscall
Notice, however, that these addresses all start at 0. The first address in the .text section is 0x00, and then everything starts counting up from there.
If we try to execute this program, the kernel will seg fault:
./main.elf
Segmentation fault
We need to tell the linker to put the code at different addresses.
In custom.ld, add this:
ENTRY (main)
SECTIONS {
. = 0x10000;
}
Here we start a SECTIONS stanza. Then we set . to 0x10000. The dot . refers to the location counter. At the start of the SECTIONS stanza, the linker assumes the location counter is 0, so if we want it to be something different, we need to set it to something different. Here we set it to 0x10000, which tells the linker to start counting from 0x10000 instead of 0. And then, the first instruction that the linker will put into the resulting executable will be at address 0x10000, and all the other instructions the linker adds after that will be offset from there.
To see this, compile the program:
make
Then look at the assembly now:
objdump -d main.elf
All of the addresses now start at 0x10000 and go up from there:
main.elf: file format elf64-x86-64
Disassembly of section .text:
0000000000010000 <foo>:
10000: b8 03 00 00 00 mov $0x3,%eax
10005: c3 retq
10006: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
1000d: 00 00 00
0000000000010010 <main>:
10010: e8 eb ff ff ff callq 10000 <foo>
10015: 48 89 c7 mov %rax,%rdi
10018: b8 3c 00 00 00 mov $0x3c,%eax
1001d: 0f 05 syscall
This program now executes (at least it does for me):
./main.elf
echo $? # I get 3, as expected
By default, the linker put our code from foo.o and main.o into the .text section of the resulting executable. But, we can tell the linker exactly what to do here.
Let's tell the linker to create a custom section in the executable called .foo, and let's put the code from foo.o's .text section inside of it. To do that, change custom.ld to this:
ENTRY (main)
SECTIONS {
. = 0x10000;
.foo : { foo.o(.text) }
}
Here we add a new entry to the SECTIONS stanza. This time, we define a section called .foo. What goes inside the .foo section? We say that the linker should look in foo.o, and take the code from the .text section that it finds there.
Rebuild the program:
make
Now look at the assembly:
objdump -d main.elf
Notice that the foo function now lives in its own section called .foo:
main.elf: file format elf64-x86-64
Disassembly of section .foo:
0000000000010000 <foo>:
10000: b8 03 00 00 00 mov $0x3,%eax
10005: c3 retq
Disassembly of section .text:
0000000000010010 <main>:
10010: e8 eb ff ff ff callq 10000 <foo>
10015: 48 89 c7 mov %rax,%rdi
10018: b8 3c 00 00 00 mov $0x3c,%eax
1001d: 0f 05 syscall
We can also see that the linker put the code it found from main.o into the .text section. By default, it puts code from an object file's .text section into the executable's .text section, unless we say otherwise in custom.ld.
We can be explicit and tell the linker to put the code from main.o's .text section into the executable's .text section if we like. Change custom.ld to this:
ENTRY (main)
SECTIONS {
. = 0x10000;
.foo : { foo.o(.text) }
.text : { main.o(.text) }
}
Recompile, and check the assembly to see that it has placed the .text section from foo.o into the executable's .foo section, and it placed the text section from main.o into the executable's .text section:
make
objdump -d main.elf
And indeed, that is what I see:
main.elf: file format elf64-x86-64
Disassembly of section .foo:
0000000000010000 <foo>:
10000: b8 03 00 00 00 mov $0x3,%eax
10005: c3 retq
Disassembly of section .text:
0000000000010010 <main>:
10010: e8 eb ff ff ff callq 10000 <foo>
10015: 48 89 c7 mov %rax,%rdi
10018: b8 3c 00 00 00 mov $0x3c,%eax
1001d: 0f 05 syscall
We can tell the linker to put the code from the .text section from all object files by using * as a wilcard to match any file. Change custom.ld to this:
ENTRY (main)
SECTIONS {
. = 0x10000;
.text : { *(.text) }
}
This says that the .text section in the executable should be populated with the .text sections from all (i.e., *) object files.
Rebuild and check the assembly to confirm:
make
objdump -d main.elf
And indeed, that is what I see:
main.elf: file format elf64-x86-64
Disassembly of section .text:
0000000000010000 <foo>:
10000: b8 03 00 00 00 mov $0x3,%eax
10005: c3 retq
10006: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
1000d: 00 00 00
0000000000010010 <main>:
10010: e8 eb ff ff ff callq 10000 <foo>
10015: 48 89 c7 mov %rax,%rdi
10018: b8 3c 00 00 00 mov $0x3c,%eax
1001d: 0f 05 syscall
In a new folder, create three assembly files. First, foo.asm:
global foo
section .text
foo:
mov rax, 3
ret
Second, bar.asm:
global bar
section .text
bar:
mov rax, 5
ret
And third, main.asm:
extern foo
global main
section .text
main:
call foo
mov rdi, rax
mov rax, 60
syscall
Create a linker script, custom.ld:
ENTRY (main)
SECTIONS {
. = 0x10000;
.text : { *(.text) }
}
And a Makefile:
all: clean build link
build:
nasm -w+all -f elf64 -o foo.o foo.asm
nasm -w+all -f elf64 -o bar.o bar.asm
nasm -w+all -f elf64 -o main.o main.asm
link:
ld -T custom.ld -o main.elf foo.o main.o
relink:
ld -T custom.ld -o main.elf bar.o foo.o main.o
.PHONY: clean
clean:
rm -rf *.o
rm -rf main.elf
Compile and run, just to make sure that main.elf exits with code 3:
make
./main.elf
echo $? # should be 3
Look at the assembly for bar.o:
objdump -d bar.o
The function bar returns 5 rather than 3 (in foo.o, the function foo returns 3):
bar.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <bar>:
0: b8 03 00 00 00 mov $0x5,%eax
5: c3 retq
We want to add the bar function into our executable. To do that, we can just re-link, but include bar.o in there. In the Makefile, this is what the relink target does:
make relink
Now look at the executable:
objdump -d main.elf
We can see that bar has been included:
main.elf: file format elf64-x86-64
Disassembly of section .text:
0000000000010000 <bar>:
10000: b8 05 00 00 00 mov $0x5,%eax
10005: c3 retq
10006: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
1000d: 00 00 00
0000000000010010 <foo>:
10010: b8 03 00 00 00 mov $0x3,%eax
10015: c3 retq
10016: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
1001d: 00 00 00
0000000000010020 <main>:
10020: e8 eb ff ff ff callq 10010 <foo>
10025: 48 89 c7 mov %rax,%rdi
10028: b8 3c 00 00 00 mov $0x3c,%eax
1002d: 0f 05 syscall
If you like, you can change main.asm to call bar now:
extern foo
extern bar
global main
section .text
main:
call bar
mov rdi, rax
mov rax, 60
syscall
Rebuild and relink:
make build relink
Run the new executable, and confirm that it exits with an exit code of 5:
./main.elf
echo $? # should be 5