Here I'm trying to understand what happens when I run
./hello
#include <stdio.h>
int main() {
printf("Hello!\n");
}
a simple "Hello World" program written in C, in Unix -- what I'd have to do if I wanted to write an OS that could execute it.
I'm going to assume that ./hello
is statically linked, because that sounds simpler to deal with. It's worth noting that a statically linked hello
is 868K on my machine. Eep.
I compiled it using
gcc -static hello.c -o hello
Any (nice!) comments or clarifications are appreciated.
To run a program, I have to be able to find the program. So there would need to be some kind of filesystem and I would need to read the file from somewhere.
In a Unix system, executables are in the ELF format.
So I would need to copy the "text" of the program somewhere.
There is a string in the program. It needs to go somewhere.
This program doesn't actually allocate memory, so perhaps it does not need a heap and it doesn't matter where the heap pointer is. It does need a stack. stack overflow question on how the stack works in assembly
hello
has some system calls in it. I found this out by running
objdump -d -M intel hello | grep 'syscall'
syscall
is an assembly instruction for making a system call. That looks like
401385: b8 03 00 00 00 mov eax,0x3
40138a: 0f 05 syscall
The number stored in eax
is the system call that is called. In this case, 3
There are 119 instances of syscall
, and it's using several different system calls. This is worrying.
(Explained more in this stackoverflow question)
I have no idea how the OS would check up on the program. I guess it doesn't just let the program run, but takes away control periodically and makes sure the stack pointer hasn't moved too far. How would it take away control? Hmm.
When there is a stack overflow I guess it sends a signal to the program, which is a POSIX thing.
I do not understand this.
There are no malloc
s in the program, so I would not need to allocate memory for it or anything.
What else?!??
- How long would this take for a human (where human = me) to write from scratch?
- Is there a way to write a smaller program with less system calls and magic? There are like 50 system calls and what are they even doing?
- Do I need a heap if I never use
malloc
? - Could I write my own printf in assembly that does less and is simpler? Just printing a string is pretty easy...
- How do I kill a program?
Regarding
malloc
, you might be mistaken. The standard I/O library has a buffer for every open I/O structure, and typically there are at least three, for standard input, standard output, and standard error output. In ancient versions of the standard I/O library, these buffers were statically allocated, but I bet they aren't in this case. The linker plays some trickery to arrange thatmain
is called when the program begins, by linking in a file usually named something likecrt
, for "C run-time". Thecrt
file initializes the stack pointers, sets up the program's arguments so that they can be read viaargv
, and other similar tasks'; it might also allocate memory for standard I/O buffers.On my system, the statically-linked executable makes these system calls:
(To get this, I ran
strace -o hello.out ./hello
.strace
runs a program and prints a report of all the system calls the program makes.)Those
brk
andmmap
calls are the ones that allocate memory. Somewhere way down in the guts of themalloc
library there are calls tobrk
or tommap
; there have to be, becausemalloc
allocates memory, it has to get that memory from somewhere, and the only place to get it is by asking the kernel. I don't know for sure what the memory is being allocated for, but it is being allocated, and I will guess that themmap
call there is to allocate the output buffer for theprintf
. This is mainly because it allocates 4096 bytes, which is a likely size for such a buffer.(Addendum: When I replce the
printf("Hello!\n")
withwrite(1, "Hello!\n", 7)
, thefstat
andmmap
calls disappear from the output ofstrace
, but not thebrk
calls, so I think my guess about the standard I/O library making callingmmap
to allocate a buffer was correct.)