Here I'm trying to understand what happens when I run
./hello
#include <stdio.h>
int main() {
printf("Hello!\n");
}
a simple "Hello World" program written in C, in Unix -- what I'd have to do if I wanted to write an OS that could execute it.
I'm going to assume that ./hello
is statically linked, because that sounds simpler to deal with. It's worth noting that a statically linked hello
is 868K on my machine. Eep.
I compiled it using
gcc -static hello.c -o hello
Any (nice!) comments or clarifications are appreciated.
To run a program, I have to be able to find the program. So there would need to be some kind of filesystem and I would need to read the file from somewhere.
In a Unix system, executables are in the ELF format.
So I would need to copy the "text" of the program somewhere.
There is a string in the program. It needs to go somewhere.
This program doesn't actually allocate memory, so perhaps it does not need a heap and it doesn't matter where the heap pointer is. It does need a stack. stack overflow question on how the stack works in assembly
hello
has some system calls in it. I found this out by running
objdump -d -M intel hello | grep 'syscall'
syscall
is an assembly instruction for making a system call. That looks like
401385: b8 03 00 00 00 mov eax,0x3
40138a: 0f 05 syscall
The number stored in eax
is the system call that is called. In this case, 3
There are 119 instances of syscall
, and it's using several different system calls. This is worrying.
(Explained more in this stackoverflow question)
I have no idea how the OS would check up on the program. I guess it doesn't just let the program run, but takes away control periodically and makes sure the stack pointer hasn't moved too far. How would it take away control? Hmm.
When there is a stack overflow I guess it sends a signal to the program, which is a POSIX thing.
I do not understand this.
There are no malloc
s in the program, so I would not need to allocate memory for it or anything.
What else?!??
- How long would this take for a human (where human = me) to write from scratch?
- Is there a way to write a smaller program with less system calls and magic? There are like 50 system calls and what are they even doing?
- Do I need a heap if I never use
malloc
? - Could I write my own printf in assembly that does less and is simpler? Just printing a string is pretty easy...
- How do I kill a program?
If you're actually planning on writing a toy OS, then you could probably start out by not caring about loading a program from a file, and instead start with a couple of hard-coded processes and figure out how to do the context-switching between them. And have you seen the OSdev wiki? It's very useful for this stuff: http://wiki.osdev.org/Main_Page
So regarding stack overflows, your virtual memory space is laid out like this:
(Please excuse the crudity of this diagram. I didn't have time to built it to scale or paint it.)
At the bottom of that space is your code, followed by your global and static variables. Above that is your heap. At the top, coming downwards, is your stack. Between the two is "empty" space. That is, those virtual addresses are not mapped to any physical memory. Apart from saving RAM that can be mapped into other processes, these pages act as "guards". When your stack tries to grow downwards beyond its allocated space, a page fault is triggered, which the OS uses as a signal that it needs to allocate more physical pages to that process.
This also brings us to the 'brk' syscall that @mjdominus mentioned. It changes the location of the "program break", which is the end of the data section. If you move it up, you get space that you can use for the heap. If you need more heap space, you move it up some more. Note that it is a syscall, so it's the kernel that is doing this, and can map in more physical memory. The program break also represents the limit of how far down the stack can grow. When the heap and the stack meet, that's when you've run out of virtual address space.
"Do I need a heap if I never use malloc?"
Depends! If you're linking with libraries, they will likely use malloc even if you're not. In some scenarios, though, such as embedded software, it's common to not use malloc at all, in which case no, you don't need a heap.
"Could I write my own printf in assembly that does less and is simpler? Just printing a string is pretty easy..."
Absolutely. And I plan to do the same myself soon, because printf is ENORMOUS. If you think about all the things it does... formatting integers as decimal or hex, formatting floating point numbers, padding, field widths, thousands separators, etc, it's no wonder. You could write a much smaller one yourself to just do the things you need.
"So I would need to copy the "text" of the program somewhere."
If everything is in RAM, you don't need to copy anything. You'd want to load the text and data segments separately, though, so they can have different memory protections. The BSS section does need to be zeroed out, though -- that's what crt.o does. (If you're on a system where your code is in flash, you need the additional step of copying the data segment (initialized data) into RAM so it can be writable!)