A good friend asked me how functions work under the covers, so I wrote up this quick explanation. Corners are cut for brevity, but I hope it will convey the general idea.
The three assembly language programs below implement the following (super simple) recursive function:
(define (function x)
(if (> x 5)
x
(function (+ 1 x))))
In assembly language, a label is a placeholder for the offset into
the code of the instruction that follows that label. Our function
starts at the label _function
and the way the code continues down
the page represents exactly how it will be loaded into memory.
For example, the code of the first section of our function is laid out like this in the binary the assembler generates:
$ otool -t foo
0000000100000f50 55 48 89 e5 48 83 ec 10 89 7d f8 83 7d f8 05 0f
0000000100000f60 8e 0b 00 00 00 8b 45 f8 89 45 fc e9 10
[ ... ]
And the label _function
means, relative to these addresses,
0000000100000f50
. Given this sort of hex dump, it's tedious but not
difficult to disassemble the binary manually by looking up the op
codes in a table like this. It
is, however, more convenient to have the machine do it for us:
$ otool -t -v foo
_function:
0000000100000f50 pushq %rbp
0000000100000f51 movq %rsp, %rbp
0000000100000f54 subq $16, %rsp
0000000100000f58 movl %edi, -8(%rbp)
0000000100000f5b cmpl $5, -8(%rbp)
0000000100000f5f jle 0x100000f70
0000000100000f65 movl -8(%rbp), %eax
0000000100000f68 movl %eax, -4(%rbp)
0000000100000f6b jmpq 0x100000f80
[ ... ]
Which reveals, as promised, that the binary layout reflects the assembly language source.
The first few lines following the _function
label are standard
bookkeeping involving the call stack. The %rbp
is the "base pointer"
and the %rsp
is the "stack pointer". This ritual of pushing the base
pointer onto the call stack is how the computer knows where to return
from a function. The stack is also how functions communicate arguments
and return values, about which
more here.
After that we perform a comparison operation against the constant 5
,
and then "jump if less than" (jle
) to our _recurse
label, which
calls the function again, copies its return value and then falls
through to the _return
label, which is boilerplate stack cleanup
code (the inverse of what we did on function entry). Otherwise, we
carry on to lines 12-14 of _function
, which copy the return value
into place and then jump (jmp
is goto
in assembly language) to the
same _return
label to perform the aforementioned boilerplate.
Under UNIX calling conventions the exit code of a process is whatever
main
returns, which in this case is what our recursive function
returned to it. Bash stores the exit code of the last command in the
variable $?
, thus allowing us to quickly test our code:
$ cc -o foo function.s
$ ./foo ; echo $?
6
Yep, six is greater than five.
The second version implements Tail Call Optimization, a clever
performance hack where the compiler figures out that a function is
isomorphic to a loop and rewrites it as one. Rather than incurring the
overhead of pushing and popping the stack every time we want to
increment our counter (and thus the danger that we might run out of
stack space), we use what amounts to a while loop implemented on the
%eax
register. (Note to former Apple II/Commodore 64 owners: %eax
is the "extended accumulator", a 32-bit cousin of the 6502's
accumulator register).
The third version combines TCO with inlining, which is a technique
where the compiler (in this case, me) determines that the function can
be included literally at the point at which it's called (the "call
site"). Here we have no stack overhead at all, just the same
single-register while loop embedded in the _main
function.