WIP in https://github.com/tanishiking/scala-native/commits/fileline-stacktrace
As described in Programmatic access to the call stack in C++ - Eli Bendersky's website We can constrcut backtrace using libunwind in the following way
#include <libunwind.h>
#include <stdio.h>
// Call this function to get a backtrace.
void backtrace() {
unw_cursor_t cursor;
unw_context_t context;
// Initialize cursor to current frame for local unwinding.
unw_getcontext(&context);
unw_init_local(&cursor, &context);
// Unwind frames one by one, going up the frame stack.
while (unw_step(&cursor) > 0) {
unw_word_t offset, pc;
unw_get_reg(&cursor, UNW_REG_IP, &pc);
if (pc == 0) {
break;
}
printf("0x%lx:", pc);
char sym[256];
if (unw_get_proc_name(&cursor, sym, sizeof(sym), &offset) == 0) {
printf(" (%s+0x%lx)\n", sym, offset);
} else {
printf(" -- error: unable to obtain symbol name for this frame\n");
}
}
}
void foo() {
backtrace(); // <-------- backtrace here!
}
void bar() {
foo();
}
int main(int argc, char **argv) {
bar();
return 0;
}
gcc -o libunwind_backtrace -Wall -g libunwind_backtrace.c -lunwind
./libunwind_backtrace
0x400958: (foo+0xe)
0x400968: (bar+0xe)
0x400983: (main+0x19)
0x7f6046b99ec5: (__libc_start_main+0xf5)
0x400779: (_start+0x29)
Ok, so we could retrieve the backtrace from call stack, and their (mangled) symbol name, and their memory address. However, it would be nice if it contains filename and line information instead of memory address which isn't probably an interesting for users.
Current scala-native backtrace is the same, we have mangled symbol name, and memory address of each function call (though we don't print it), but there's no filename and line number information.
java.lang.Error: test
at java.lang.StackTrace$.$anonfun$currentStackTrace$1(Unknown Source)
at java.lang.StackTrace$$$Lambda$2.apply(Unknown Source)
at scala.scalanative.unsafe.Zone$.apply(Unknown Source)
at java.lang.StackTrace$.currentStackTrace(Unknown Source)
at java.lang.Throwable.fillInStackTrace(Unknown Source)
at Test$.error(Unknown Source)
at Test$.g(Unknown Source)
at Test$.f(Unknown Source)
at Test$.main(Unknown Source)
at Test.main(Unknown Source)
at <none>.main(Unknown Source)
it going to be
java.lang.Error: test
at java.lang.StackTrace$.$anonfun$currentStackTrace$1(Throwables.scala:56)
at java.lang.StackTrace$$$Lambda$3.apply(Throwables.scala:56)
at scala.scalanative.unsafe.Zone$.apply(Zone.scala:27)
at java.lang.StackTrace$.currentStackTrace(Throwables.scala:50)
at java.lang.Throwable.fillInStackTrace(Throwables.scala:126)
at Test$.error(Test.scala:24)
at Test$.g(Test.scala:22)
at Test$.f(Test.scala:20)
at Test$.main(Test.scala:5)
at Test.main(Test.scala:5)
at <none>.main(Unknown Source)
we want not only the caller's name, but also the call location (source file name + line number). ... libunwind gives us the call address, but nothing beyond. Fortunately, it's all in the DWARF information of the binary, and given the address we can extract the exact call location in a number of ways
Yes, those filename and line information is available from DWARF debug information! And scala-native is getting ready to generate debug information to executable! Generate LLVM metadata by keynmol · Pull Request #2869 · scala-native/scala-native
.debug_info
section in executable, has a list of DIE (Debug Information Entry) and each has attributes called DW_AT_low_pc
and DW_AT_high_pc
.
Note that, since debug information is embedded in executable, we have to parse the executable such as mach-o, ELF, and PE. We don't have to fully parse it, just parse header and seek to the interesting section, then DWARF parser can be used accross all executable formats. (technically speaking, debug information sometimes stored in a separate file)
For example, the a function func2
is described as "subprogram DIE".
If we dump the debug information using llvm-dwarfdump
or readelf
or whatever, we can find something following for func2
.
[ 330] subprogram
name (strp) "func2"
decl_file (data1) 1
decl_line (data1) 9
prototyped (flag_present) yes
type (ref4) [ 62]
low_pc (addr) 0x000000000040051c <func2>
high_pc (data8) 28 (0x0000000000400538 <main>)
frame_base (exprloc)
[ 0] call_frame_cfa
GNU_all_tail_call_sites (flag_present) yes
sibling (ref4) [ 35f]
As we can see, it has low_pc
that describes the starting address of the block, and high_pc
that describes that end address of the block (actually it's a offset from low_pc
).
So, we can find a appropriate subprogram DIE by searching for a DIE block that surrounds the given address (from libunwind
) by low_pc
and high_pc
!
Then DW_AT_decl_line
describes the line number of the function, and filename can be retrieved some way.
DW_AT_decl_file
describes the number of the file table in the.debug_line
section, which is a bit complicated to parse.- Actually, DIEs form a tree structure, and the last preceding CU (Compilation Unit) DIE should be the file containing the subprogram DIE.
- Once we find the CU DIE, it has
DW_AT_name
that describes it's name.
- Once we find the CU DIE, it has
Actually, binutils
's addr2line
command should do this.
As described in the blog
$ addr2line 0x400968 -e libunwind_backtrace
libunwind_backtrace.c:37
Did it work? In my local environment, NO.
Why it didn't work? To understand this, we have to know PIE (Position Indenpendent Executable) and ASLR (Address Space Layout Randomization).
PIE (Position Independent Executable) is a feature that allows executable code to be loaded at random memory addresses, making it harder for attackers to exploit vulnerabilities by preventing them from relying on fixed memory locations. And OS's ASLR feature actually randomize the load address of the PIE.
Under the combined use of PIE and ASLR, the actual load address of an executable is determined when it is loaded by the OS.
The address at which functions should be loaded is determined at link time and DW_AT_low_pc
contains the address determined at link time.
However, when loading, a random offset called ASLR offset (slide) is inserted in the ASLR before the load address.
Mach-O example:
image from https://www.jianshu.com/p/7ad3d3d868f9
ASLR added 0x5000
as an offset at loading time.
addr2line
reads the debug information embedded in the executable, finds the corresponding DIE from low_pc
and high_pc
(with no ASLR offset) and provides the file name and line number.
However, the address obtained by libunwind
at runtime has a random ASLR offset added at load time,
which should be larger than the value written to low_pc
and high_pc
. That's why you cannot find debugging information using the address obtained from libunwind (with ASLR offset).
What we need is, calculate the ASLR offset and subtract it from the address retrieved from libunwind
.
We should be able to know, which address the process is loaded by reading memory map: for Linux /proc/<pid>/maps
and Mac has vmmap
.
https://github.com/tanishiking/scala-native/tree/fileline-stacktrace
/proc/pid/maps
)