Skip to content

Instantly share code, notes, and snippets.

@simonhf
Created December 3, 2019 00:20
Show Gist options
  • Save simonhf/aaa76630dfda5dab2028f1c2e8425975 to your computer and use it in GitHub Desktop.
Save simonhf/aaa76630dfda5dab2028f1c2e8425975 to your computer and use it in GitHub Desktop.
Which other missing feature in gcc would make tracing function calls in userspace C/C++ applications easier

Which other missing feature in gcc would make tracing function calls in userspace C/C++ applications easier

Background

Let's say we want to compile a larger C/C++ application with 80,000 unique functions so that at run-time the function call-tree is output for code comprehension and debugging purposes.

Here [1] various methods and implementation options were discussed including the possibility of modifying gcc itself.

However, modifying gcc to insert macros all over the code has an unknown level of difficulty... but it's probably more difficult than expected.

So here is a further idea which can work without modifying gcc, but which could in theory be faster at compile time if gcc would be modified to do the same thing!

[1] https://gist.github.com/simonhf/489b1f3353a0ac095df01a1ddb6ad0e7

Which other missing feature in gcc could make this easier and more flexible?

  • This idea is very similar to the -finstrument-macros and __FUNCTION_UNIQUE__ idea presented in [1], but possibly less complicated and more pragmatic :-)
  • Instead of the higher implementation risk idea of inserting macros every, and introducing a new and complicated macro __FUNCTION_UNIQUE__, the new idea works as follows:
  • The current gcc feature -finstrument-macros together with -O0 generates Intel assembler which looks like this:
    movq 24(%rsp), %rsi                   <-- function parameter #2
    leaq my_func(%rip), %rdi              <-- function parameter #1
    call __cyg_profile_func_enter@PLT
...
    movq 24(%rsp), %rsi                   <-- function parameter #2
    leaq my_func(%rip), %rdi              <-- function parameter #1
    call __cyg_profile_func_exit@PLT
  • Note: For Linux, the first 6 function parameters get passed as registers as described here [2].
  • The first parameter to function __cyg_profile_func_(enter|exit)() is the 64bit address of the function being instrumented, which is, of course, unique.
  • In C++, the function label would be the mangled name of the function, but still unique.
  • We can invoke gcc using the exiting -finstrument-functions feature, and ask it to output assembler instead of outputting an object file.
  • We can then modify the output assembler and assembler it to an object file.
  • The modified assembler would look like this:
    movq 24(%rsp), %rsi                   <-- function parameter #2
    leaq cwrap_data_my_func(%rip), %rdi   <-- function parameter #1

    movl cwrap_data_my_func(%rip), %edx   <-- new: register edx holds function verbosity
    cmpl cwrap_log_verbosity(%rip), %edx  <-- new: compare with global verbosity
    jl   .L_cwrap_skip_cyg_profile_func_1 <-- new: skip call if verbosity too low
    call __cyg_profile_func_enter@PLT
.L_cwrap_skip_cyg_profile_func_1:
...
    movq 24(%rsp), %rsi                   <-- function parameter #2
    leaq cwrap_data_my_func(%rip), %rdi   <-- function parameter #1

    movl cwrap_data_my_func(%rip), %edx   <-- new: register edx holds function verbosity
    cmpl cwrap_log_verbosity(%rip), %edx  <-- new: compare with global verbosity
    jl   .L_cwrap_skip_cyg_profile_func_1 <-- new: skip call if verbosity too low
    call __cyg_profile_func_exit@PLT
.L_cwrap_skip_cyg_profile_func_1:
  • Register 'edx' is used because it would be used for the 3rd function parameter [2] if __cyg_profile_func_(enter|exit)() used a 3rd parameter.
  • In the above example my_func is converted to cwrap_data_my_func. Instead of passing the unique address of my_func, then the unique name is converted into a unique data structure.
  • The data structure cwrap_data_my_func hold lots of run-time info about the function including the verbosity which is always the first item, e.g.:
typedef struct CWRAP_DATA {
    int verbosity;
    int magic;
    int calls;
    int line;
    const char * name_cpp; int len_name_cpp;
    const char * name    ; int len_name    ;
    const char * result  ; int len_result  ;
    const char * params  ; int len_params  ;
    const char * file    ; int len_file    ;
} __attribute__((aligned(32),packed)) CWRAP_DATA;
  • A verbosity jump wrapper is placed around the call to __cyg_profile_func_(enter|exit)() which compares the verbosity for the function with the global verbosity.
  • The idea is that it's better for run-time performance using if(){ call{} } than call{ if(){} }.
  • Using the above mechanism then all C/C++ files on the project can be compiled to object files via modified assembler files.
  • Finally at linking, the linker will list all the 'undefined references' e.g. cwrap_data_my_func.
  • At this point we can dynamically create the necessary C/C++ source, compile, and link it to resolve the undefined references, e.g.:
$ cat cwrap.c
...
CWRAP_DATA cwrap_data_my_func = {1, CWRAP_MAGIC, 0, 11, "my_func", 7, "my_func", 7, "", 0, "", 0, "my-example.cpp", 14};
...

[1] https://gist.github.com/simonhf/489b1f3353a0ac095df01a1ddb6ad0e7 [2] https://en.wikipedia.org/wiki/X86_calling_conventions#System_V_AMD64_ABI

Issues with this instrumentation strategy

  • The strategy appears to work fine for -O0 and maybe for -O1, but can easily fail for -O2. Why?
  • This is because gcc generates different assembler code when functions get inlined, e.g.:
   leaq main(%rip), %rdi
   leaq inc_a(%rip), %rbp
...
   call __cyg_profile_func_enter@PLT
...
   movq %rbp, %rdi
   call __cyg_profile_func_enter@PLT <-- inline function #1 enter via rbp
...
   movq %rbp, %rdi
   call __cyg_profile_func_exit@PLT  <-- inline function #1 leave via rbp
...
   leaq inc_b(%rip), %rdi
   call __cyg_profile_func_enter@PLT <-- inline function #2 enter
...
   leaq inc_b(%rip), %rdi
   call __cyg_profile_func_exit@PLT  <-- inline function #2 enter
...
   leaq main(%rip), %rdi
   call __cyg_profile_func_exit@PLT
  • In the above example, two functions inc_a() and inc_b() get inlined into main().
  • While functions main() and inc_b() continue to get the 64bit address of their functions directly loaded into register rdi, function inc_a() indirectly uses register rbp instead.
  • While it's possible to take this gcc pattern into consideration and get small test C/C++ projects compiled with -O2 working, it's unknown how many other patterns exist, and how much of a problem this is.

Implementing this instrumentation as a gcc modification

  • To avoid tussling with the gcc output assembler files, gcc could be modifed to generate the modified instructions directly.
  • The new -finstrument-functions-advanced could be made available in addition to the exsting -finstrument-functions option.
  • This means the compilation of C/C++, with the -finstrument-functions-advanced option, would happen without any outside meddling to 'instument the instrumentation'.
  • However, the user is still required to harvest the undefined reference during linking, dynamically create the necessary C/C++ source, and compile, and link it.

End to end instrumentation example using -finstrument-functions and assembler output manipulation

  • Note: -O3 is used during compilation to assembler.
  • Note: The script trace.pl is hard-coded to this particular example currently, but it shouldn't be that much effort to make it work with any C/C++ files being compiled or linked.
  • Note: There are a bunch of non-mangled C and mangled C++ function names.
  • Note: The output below shows the progress of scanning the output assembler files. You can see how functions like bar() and baz() have been inlined, so the calls to __cyg_profile_func_(enter|exit)() have also been inlined.
  • Note: A simple example function cwrap_log_verbosity_set() is provided to show how function verbosity levels can be manipulated at run-time if desired.
  • Note: Linking happens twice; once to gather the undefined references, and once to include the auto generated cwrap.c file containing the structs for each instrumented funciton.
  • Note: This example uses printf() to display the instrumentation, but in an production example output would likely go to a better performing, shared memory ring buffer instead.
  • Note: This example overrides printf() with a preprocessor definition purely to make the existing non-instrumentation printf() statements allign with the indentation of the instrumented function calls.
  • Note: An example cwrap_log_stats() function is provided to illustrate how the static data for each function can be used at run-time for further analysis, in this case to report before process exit, the number of instrumented calls displayed at run-time.
  • Note: The number of calls made so far to a particular function is also displayed in the example instrumentation output, e.g. my_struct::~my_struct() is called 4 times in total.
  • Note: In the example code provided, no attempt is made to collapse the output for short functions, e.g. instead of 2 instrumentation lines, only show + baz() {} // #1.
$ ./doit.sh 
compiling cpp-example-1.cpp to assembler

compiling cpp-example-2.cpp to assembler

munging cpp-example-1.s
- s_file=cpp-example-1.s
- s_file_lines=8665
- found __cyg_profile_func_(enter|exit}() calls: 58 in 0.000298s
  - enter _Z7get_maxIiET_S0_S0_
  - exit  _Z7get_maxIiET_S0_S0_
  - enter _Z7get_maxIcET_S0_S0_
  - exit  _Z7get_maxIcET_S0_S0_
  - enter _Z7bye_bazv
  - exit  _Z7bye_bazv
  - exit  _Z7bye_bazv
  - enter _Z3byev
  - exit  _Z3byev
  - exit  _Z3byev
  - enter _ZN9my_structC2EPKc
  - exit  _ZN9my_structC2EPKc
  - exit  _ZN9my_structC2EPKc
  - enter _ZN9my_structC2EPKcS1_
  - exit  _ZN9my_structC2EPKcS1_
  - exit  _ZN9my_structC2EPKcS1_
  - enter _ZN9my_structD2Ev
  - exit  _ZN9my_structD2Ev
  - enter _Z3bazi
  - exit  _Z3bazi
  - enter _ZN3FooC2EPKc
  - enter _ZN9my_structC2EPKcS1_
  - exit  _ZN9my_structC2EPKcS1_
  - exit  _ZN3FooC2EPKc
  - exit  _ZN9my_structC2EPKcS1_
  - exit  _ZN3FooC2EPKc
  - enter _Z3bari
  - enter _Z3bazi
  - exit  _Z3bazi
  - exit  _Z3bari
  - enter _ZN3Foo10my_privateEi
  - enter _Z3bari
  - enter _Z3bazi
  - exit  _Z3bazi
  - exit  _Z3bari
  - exit  _ZN3Foo10my_privateEi
  - enter _ZN3Foo9my_publicEi
  - enter _ZN3Foo10my_privateEi
  - enter _Z3bari
  - enter _Z3bazi
  - exit  _Z3bazi
  - exit  _Z3bari
  - exit  _ZN3Foo10my_privateEi
  - exit  _ZN3Foo9my_publicEi
  - enter _ZN3FooD2Ev
  - enter _ZN9my_structD2Ev
  - exit  _ZN9my_structD2Ev
  - exit  _ZN3FooD2Ev
  - enter main
  - enter _Z7get_maxIiET_S0_S0_
  - exit  _Z7get_maxIiET_S0_S0_
  - enter _Z7get_maxIcET_S0_S0_
  - exit  _Z7get_maxIcET_S0_S0_
  - enter _ZN9my_structC2EPKc
  - exit  _ZN9my_structC2EPKc
  - exit  main
  - exit  main
  - exit  _ZN9my_structC2EPKc
- writing to: cpp-example-1.s.2.s

munging cpp-example-2.s
- s_file=cpp-example-2.s
- s_file_lines=3131
- found __cyg_profile_func_(enter|exit}() calls: 16 in 0.000112s
  - enter _Z8clean_upPi
  - exit  _Z8clean_upPi
  - exit  _Z8clean_upPi
  - enter _ZN9my_structD2Ev
  - exit  _ZN9my_structD2Ev
  - enter _ZN9my_structC2EPKc
  - exit  _ZN9my_structC2EPKc
  - exit  _ZN9my_structC2EPKc
  - enter _Z41__static_initialization_and_destruction_0ii
  - exit  _Z41__static_initialization_and_destruction_0ii
  - enter _ZN9my_structC2EPKc
  - exit  _ZN9my_structC2EPKc
  - exit  _ZN9my_structC2EPKc
  - exit  _Z41__static_initialization_and_destruction_0ii
  - enter _GLOBAL__sub_I_my_struct_1
  - exit  _GLOBAL__sub_I_my_struct_1
- writing to: cpp-example-2.s.2.s

assembling cpp-example-1.s.2.s

assembling cpp-example-2.s.2.s

compiling compiling cwrap.c for linking
- linking: gcc -g -o cpp-example cpp-example-1.o cpp-example-2.o -lstdc++ 2>&1
- writing 17 missing cwrap structs to: cwrap.c
  - _Z7bye_bazv                                      bye_baz()
  - _Z3byev                                          bye()
  - _Z3bazi                                          baz(int)
  - _Z3bari                                          bar(int)
  - _Z7get_maxIiET_S0_S0_                            int get_max<int>(int, int)
  - _Z7get_maxIcET_S0_S0_                            char get_max<char>(char, char)
  - main                                             main
  - _ZN9my_structC2EPKc                              my_struct::my_struct(char const*)
  - _ZN9my_structC2EPKcS1_                           my_struct::my_struct(char const*, char const*)
  - _ZN9my_structD2Ev                                my_struct::~my_struct()
  - _ZN3FooC2EPKc                                    Foo::Foo(char const*)
  - _ZN3Foo10my_privateEi                            Foo::my_private(int)
  - _ZN3Foo9my_publicEi                              Foo::my_public(int)
  - _ZN3FooD2Ev                                      Foo::~Foo()
  - _Z8clean_upPi                                    clean_up(int*)
  - _Z41__static_initialization_and_destruction_0ii  __static_initialization_and_destruction_0(int, int)
  - _GLOBAL__sub_I_my_struct_1                       _GLOBAL__sub_I_my_struct_1
- linking: gcc -g -o cpp-example cwrap.c cpp-example-1.o cpp-example-2.o -lstdc++ 2>&1

running
+ _GLOBAL__sub_I_my_struct_1() {
  + __static_initialization_and_destruction_0() {
    + my_struct::my_struct() {
      - constructing my_struct: arg1:my_struct_1
      } // my_struct::my_struct() #1
    } // __static_initialization_and_destruction_0() #1
  } // _GLOBAL__sub_I_my_struct_1() #1
+ main() {
  + cwrap_log_verbosity_set(verbosity=0, name=bar) {
    - i=0 .verbosity=1 .name=bye_baz
    - i=1 .verbosity=1 .name=bye
    - i=2 .verbosity=1 .name=baz
    - i=3 .verbosity=1 .name=bar
    - setting verbosity to zero
    - i=4 .verbosity=1 .name=get_max<int>
    - i=5 .verbosity=1 .name=get_max<char>
    - i=6 .verbosity=1 .name=main
    - i=7 .verbosity=1 .name=my_struct::my_struct
    - i=8 .verbosity=1 .name=my_struct::my_struct
    - i=9 .verbosity=1 .name=my_struct::~my_struct
    - i=10 .verbosity=1 .name=Foo::Foo
    - i=11 .verbosity=1 .name=Foo::my_private
    - i=12 .verbosity=1 .name=Foo::my_public
    - i=13 .verbosity=1 .name=Foo::~Foo
    - i=14 .verbosity=1 .name=clean_up
    - i=15 .verbosity=1 .name=__static_initialization_and_destruction_0
    - i=16 .verbosity=1 .name=_GLOBAL__sub_I_my_struct_1
    } // cwrap_log_verbosity_set()
  - hello world
  + baz() {
    } // baz() #1
  + get_max<int>() {
    } // get_max<int>() #1
  + get_max<char>() {
    } // get_max<char>() #1
  + my_struct::my_struct() {
    - constructing my_struct: arg1:my_struct_2
    } // my_struct::my_struct() #2
  + Foo::Foo() {
    + my_struct::my_struct() {
      - constructing my_struct: arg1:my_struct_3 arg2:a
      } // my_struct::my_struct() #1
    - constructing Foo
    } // Foo::Foo() #1
  + Foo::Foo() {
    + my_struct::my_struct() {
      - constructing my_struct: arg1:my_struct_3 arg2:b
      } // my_struct::my_struct() #2
    - constructing Foo
    } // Foo::Foo() #2
  + Foo::my_public() {
    + Foo::my_private() {
      + baz() {
        } // baz() #2
      } // Foo::my_private() #1
    } // Foo::my_public() #1
  + baz() {
    } // baz() #3
  + Foo::~Foo() {
    - deconstructing Foo
    + my_struct::~my_struct() {
      - deconstructing my_struct: my_struct_3
      } // my_struct::~my_struct() #1
    } // Foo::~Foo() #1
  + Foo::~Foo() {
    - deconstructing Foo
    + my_struct::~my_struct() {
      - deconstructing my_struct: my_struct_3
      } // my_struct::~my_struct() #2
    } // Foo::~Foo() #2
  + my_struct::~my_struct() {
    - deconstructing my_struct: my_struct_2
    } // my_struct::~my_struct() #3
  + clean_up() {
    - my_int=5
    } // clean_up() #1
  } // main() #1
+ bye_baz() {
  - called via atexit() via baz()
  } // bye_baz() #1
+ bye_baz() {
  - called via atexit() via baz()
  } // bye_baz() #2
+ bye_baz() {
  - called via atexit() via baz()
  } // bye_baz() #3
+ bye() {
  - called via atexit() via main()
  } // bye() #1
+ cwrap_log_stats() {
  - 3 call(s) to bye_baz()
  - 1 call(s) to bye()
  - 3 call(s) to baz(int)
  - 0 call(s) to bar(int)
  - 1 call(s) to get_max<int>(int, int)
  - 1 call(s) to get_max<char>(char, char)
  - 1 call(s) to main()
  - 2 call(s) to my_struct::my_struct(char const*)
  - 2 call(s) to my_struct::my_struct(char const*, char const*)
  - 3 call(s) to my_struct::~my_struct()
  - 2 call(s) to Foo::Foo(char const*)
  - 1 call(s) to Foo::my_private(int)
  - 1 call(s) to Foo::my_public(int)
  - 2 call(s) to Foo::~Foo()
  - 1 call(s) to clean_up(int*)
  - 1 call(s) to __static_initialization_and_destruction_0(int, int)
  - 1 call(s) to _GLOBAL__sub_I_my_struct_1()
  } // cwrap_log_stats()
+ my_struct::~my_struct() {
  - deconstructing my_struct: my_struct_1
  } // my_struct::~my_struct() #4
#include <stdio.h>
#include <stdlib.h>
#include "cpp-example-1.hpp"
extern void clean_up(int * my_int);
void bye_baz (void ) { printf("- called via atexit() via baz()\n"); }
void bye (void ) { printf("- called via atexit() via main()\n"); }
int baz (int a ) { atexit(bye_baz); return a ; }
int bar (int a ) { return baz(a); }
#ifdef __cplusplus
class Foo {
private:
my_struct my_struct_3;
#endif
int my_private(int a) { return bar(a); }
#ifdef __cplusplus
public:
Foo(const char* arg1) : my_struct_3("my_struct_3", arg1) { printf("- constructing Foo\n"); }
~Foo() { printf("- deconstructing Foo\n"); }
#endif
int my_public(int a) { return my_private(a); }
#ifdef __cplusplus
};
#endif
int main() {
cwrap_log_verbosity_set(0, "bar");
atexit(cwrap_log_stats);
atexit(bye);
int my_int __attribute__ ((__cleanup__(clean_up))) = 1;
my_int = 5;
printf("- hello world\n");
int b = baz(1);
#ifdef __cplusplus
int x1,y1;
get_max <int> (x1,y1);
char x2,y2;
get_max <char> (x2,y2);
my_struct my_struct_2("my_struct_2");
Foo foo_1("a");
Foo foo_2("b");
return bar(foo_1.my_public(0));
#else
return bar( my_public(0));
#endif
}
#include "cpp-example-2.hpp"
#include <stdio.h>
#include "cpp-example-1.hpp"
#ifdef __cplusplus
my_struct my_struct_1("my_struct_1");
#endif
void clean_up(int * my_int) { printf("- my_int=%d\n", *my_int); }
#ifdef __cplusplus
struct my_struct {
my_struct(const char *f ) { f_ = f; printf("- ""constructing my_struct: arg1:%s\n" , f_ ); }
my_struct(const char *f, const char *b) { f_ = f; printf("- ""constructing my_struct: arg1:%s arg2:%s\n", f_, b); }
~my_struct( ) { printf("- deconstructing my_struct: %s\n" , f_ ); }
const char *f_;
};
template <class my_type>
my_type get_max (my_type a, my_type b) {
return (a>b?a:b);
}
#endif
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <signal.h>
#define CWRAP_LOG_LINE_MAX (4096)
char cwrap_log_spaces[] = " ";
static __thread char cwrap_log_line[CWRAP_LOG_LINE_MAX];
static __thread int cwrap_log_line_pos = 0;
__thread int cwrap_log_lines = 0;
__thread int cwrap_log_indent = 0;
static FILE * cwrap_log_file = NULL;
static int cwrap_log_fd;
static double cwrap_log_time;
int cwrap_log_verbosity = 1;
#define CWRAP_MAGIC (0xDEADBABE)
typedef struct CWRAP_DATA {
int verbosity;
int magic;
int calls;
int line;
const char * name_cpp; int len_name_cpp;
const char * name ; int len_name ;
const char * result ; int len_result ;
const char * params ; int len_params ;
const char * file ; int len_file ;
} __attribute__((aligned(32),packed)) CWRAP_DATA;
CWRAP_DATA cwrap_data__Z7bye_bazv = {1, CWRAP_MAGIC, 0, 7, "_Z7bye_bazv", 11, "bye_baz", 7, "", 0, "", 0, "cpp-example-1.cpp", 17};
CWRAP_DATA cwrap_data__Z3byev = {1, CWRAP_MAGIC, 0, 8, "_Z3byev", 7, "bye", 3, "", 0, "", 0, "cpp-example-1.cpp", 17};
CWRAP_DATA cwrap_data__Z3bazi = {1, CWRAP_MAGIC, 0, 9, "_Z3bazi", 7, "baz", 3, "", 0, "int", 3, "cpp-example-1.cpp", 17};
CWRAP_DATA cwrap_data__Z3bari = {1, CWRAP_MAGIC, 0, 10, "_Z3bari", 7, "bar", 3, "", 0, "int", 3, "cpp-example-1.cpp", 17};
CWRAP_DATA cwrap_data__Z7get_maxIiET_S0_S0_ = {1, CWRAP_MAGIC, 0, 10, "_Z7get_maxIiET_S0_S0_", 21, "get_max<int>", 12, "int", 3, "int, int", 8, "cpp-example-2.hpp", 17};
CWRAP_DATA cwrap_data__Z7get_maxIcET_S0_S0_ = {1, CWRAP_MAGIC, 0, 10, "_Z7get_maxIcET_S0_S0_", 21, "get_max<char>", 13, "char", 4, "char, char", 10, "cpp-example-2.hpp", 17};
CWRAP_DATA cwrap_data_main = {1, CWRAP_MAGIC, 0, 33, "main", 4, "main", 4, "", 0, "", 0, "cpp-example-1.cpp", 17};
CWRAP_DATA cwrap_data__ZN9my_structC2EPKc = {1, CWRAP_MAGIC, 0, 33, "_ZN9my_structC2EPKc", 19, "my_struct::my_struct", 20, "", 0, "char const*", 11, "cpp-example-1.cpp", 17};
CWRAP_DATA cwrap_data__ZN9my_structC2EPKcS1_ = {1, CWRAP_MAGIC, 0, 4, "_ZN9my_structC2EPKcS1_", 22, "my_struct::my_struct", 20, "", 0, "char const*, char const*", 24, "cpp-example-2.hpp", 17};
CWRAP_DATA cwrap_data__ZN9my_structD2Ev = {1, CWRAP_MAGIC, 0, 5, "_ZN9my_structD2Ev", 17, "my_struct::~my_struct", 21, "", 0, "", 0, "cpp-example-2.hpp", 17};
CWRAP_DATA cwrap_data__ZN3FooC2EPKc = {1, CWRAP_MAGIC, 0, 21, "_ZN3FooC2EPKc", 13, "Foo::Foo", 8, "", 0, "char const*", 11, "cpp-example-1.cpp", 17};
CWRAP_DATA cwrap_data__ZN3Foo10my_privateEi = {1, CWRAP_MAGIC, 0, 17, "_ZN3Foo10my_privateEi", 21, "Foo::my_private", 15, "", 0, "int", 3, "cpp-example-1.cpp", 17};
CWRAP_DATA cwrap_data__ZN3Foo9my_publicEi = {1, CWRAP_MAGIC, 0, 24, "_ZN3Foo9my_publicEi", 19, "Foo::my_public", 14, "", 0, "int", 3, "cpp-example-1.cpp", 17};
CWRAP_DATA cwrap_data__ZN3FooD2Ev = {1, CWRAP_MAGIC, 0, 22, "_ZN3FooD2Ev", 11, "Foo::~Foo", 9, "", 0, "", 0, "cpp-example-1.cpp", 17};
CWRAP_DATA cwrap_data__Z8clean_upPi = {1, CWRAP_MAGIC, 0, 8, "_Z8clean_upPi", 13, "clean_up", 8, "", 0, "int*", 4, "cpp-example-2.cpp", 17};
CWRAP_DATA cwrap_data__Z41__static_initialization_and_destruction_0ii = {1, CWRAP_MAGIC, 0, 8, "_Z41__static_initialization_and_destruction_0ii", 47, "__static_initialization_and_destruction_0", 41, "", 0, "int, int", 8, "cpp-example-2.cpp", 17};
CWRAP_DATA cwrap_data__GLOBAL__sub_I_my_struct_1 = {1, CWRAP_MAGIC, 0, 8, "_GLOBAL__sub_I_my_struct_1", 26, "_GLOBAL__sub_I_my_struct_1", 26, "", 0, "", 0, "cpp-example-2.cpp", 17};
CWRAP_DATA cwrap_data_array_end = {-1, CWRAP_MAGIC, -1, -1, "", 0, "", 0, "", 0, "", 0, "", 0};
CWRAP_DATA * cwrap_data_array = &cwrap_data__Z7bye_bazv;
void cwrap_data_sanity_check(CWRAP_DATA * cw) {
if (cw->magic != CWRAP_MAGIC) {
printf("ERROR: internal: cwrap bad data; exiting // cwrap_data_array=%p cw=%p diff=%ld\n", cwrap_data_array, cw, (char* ) cw - (char *) cwrap_data_array);
raise(SIGINT);
exit(1);
}
}
void cwrap_log_verbosity_set(int verbosity, const char * name) {
printf("%.*s+ cwrap_log_verbosity_set(verbosity=%d, name=%s) {\n", 2 * cwrap_log_indent, &cwrap_log_spaces[0], verbosity, name);
int i = 0;
while (cwrap_data_array[i].verbosity >= 0) {
cwrap_data_sanity_check(&cwrap_data_array[i]);
printf("%.*s - i=%d .verbosity=%d .name=%s\n", 2 * cwrap_log_indent, &cwrap_log_spaces[0], i, cwrap_data_array[i].verbosity, &cwrap_data_array[i].name[0]);
if (0 == memcmp(name, cwrap_data_array[i].name, 3)) {
printf("%.*s - setting verbosity to zero\n", 2 * cwrap_log_indent, &cwrap_log_spaces[0]);
cwrap_data_array[i].verbosity = 0;
}
i ++;
}
printf("%.*s } // cwrap_log_verbosity_set()\n", 2 * cwrap_log_indent, &cwrap_log_spaces[0]);
}
void cwrap_log_stats(void) {
printf("+ cwrap_log_stats() {\n");
int i = 0;
while (cwrap_data_array[i].verbosity >= 0) {
printf(" - %d call(s) to %s(%s)\n", cwrap_data_array[i].calls, &cwrap_data_array[i].name[0], &cwrap_data_array[i].params[0]);
i ++;
}
printf(" } // cwrap_log_stats()\n");
}
void __cyg_profile_func_enter (void *this_fn, void *call_site) {
CWRAP_DATA * cw = this_fn;
cwrap_data_sanity_check(cw);
printf("%.*s+ %s() {\n", 2 * cwrap_log_indent, &cwrap_log_spaces[0], cw->name);
cw->calls ++;
cwrap_log_indent ++;
}
void __cyg_profile_func_exit (void *this_fn, void *call_site) {
CWRAP_DATA * cw = this_fn;
cwrap_data_sanity_check(cw);
printf("%.*s} // %s() #%d\n", 2 * cwrap_log_indent, &cwrap_log_spaces[0], cw->name, cw->calls);
cwrap_log_indent --;
if (cwrap_log_indent < 0) {
printf("ERROR: internal: cwrap_log_indent < 0\n");
raise(SIGINT);
exit(1);
}
}
#include <stdio.h>
#define CWRAP (1)
extern __thread int cwrap_log_indent;
extern char cwrap_log_spaces[];
#ifdef __cplusplus
extern "C" {
#endif
extern void cwrap_log_verbosity_set(int verbosity, const char * name);
extern void cwrap_log_stats(void);
#ifdef __cplusplus
}
#endif
#define printf(...) { \
printf("%.*s%.*s", cwrap_log_indent, &cwrap_log_spaces[0], cwrap_log_indent, &cwrap_log_spaces[0]); \
printf(__VA_ARGS__); \
}
set -e
echo compiling cpp-example-1.cpp to assembler
gcc -O3 -g -c cpp-example-1.cpp -finstrument-functions --include cwrap.h -S
echo compiling cpp-example-2.cpp to assembler
gcc -O3 -g -c cpp-example-2.cpp -finstrument-functions --include cwrap.h -S
echo munging cpp-example-1.s
perl trace.pl cpp-example-1.s
echo munging cpp-example-2.s
perl trace.pl cpp-example-2.s
echo assembling cpp-example-1.s.2.s
gcc -g -o cpp-example-1.o -c cpp-example-1.s.2.s
echo assembling cpp-example-2.s.2.s
gcc -g -o cpp-example-2.o -c cpp-example-2.s.2.s
echo compiling cwrap.c for linking
perl trace.pl
echo running
./cpp-example
use strict;
use Time::HiRes;
use List::MoreUtils qw(first_index indexes);
if (not defined $ARGV[0]) {
my $link_cmd = qq[gcc -g -o cpp-example cpp-example-1.o cpp-example-2.o -lstdc++ 2>&1];
# my $link_cmd = qq[gcc -g -o overhead overhead.o -lstdc++ 2>&1];
printf qq[- linking: %s\n], $link_cmd;
my @output = `$link_cmd`;
#printf "@output";
my @undefined_errors = grep(m~undefined reference to .cwrap_data_~, @output); # e.g. cpp-example-1.cpp:5: undefined reference to `cwrap__Z8clean_upPi'
my $h;
my $cmd_cppfilt = qq[c++filt];
my @mangled_names;
my @source_files;
my @source_lines;
my @undefineds;
foreach my $undefined_error (@undefined_errors) {
next if ($undefined_error =~ m~cwrap_log_~);
chomp $undefined_error;
my ($source_file, $source_line, $missing_ref) = $undefined_error =~ m~([^/]+):([^\:]+): undefined reference to .(cwrap_[_A-Za-z0-9]+)~;
my ($original_ref) = $missing_ref =~ m~cwrap_data_(.*)~;
if (not exists $h->{$missing_ref}) {
push @source_files, $source_file;
push @source_lines, $source_line;
push @undefineds, $undefined_error;
push @mangled_names, $original_ref; # eg. _Z7get_maxIiET_S0_S0_
$cmd_cppfilt .= qq[ $original_ref];
}
$h->{$missing_ref} ++;
}
my $cwrap_file = "cwrap.c";
printf qq[- writing %d missing cwrap structs to: %s\n], scalar keys %{ $h }, $cwrap_file;
open(my $out, '>', $cwrap_file) || die sprintf qq[ERROR: cannot open file for writing: %s\n], $cwrap_file;
printf $out <<EOF;
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <signal.h>
#define CWRAP_LOG_LINE_MAX (4096)
char cwrap_log_spaces[] = " ";
static __thread char cwrap_log_line[CWRAP_LOG_LINE_MAX];
static __thread int cwrap_log_line_pos = 0;
__thread int cwrap_log_lines = 0;
__thread int cwrap_log_indent = 0;
static FILE * cwrap_log_file = NULL;
static int cwrap_log_fd;
static double cwrap_log_time;
int cwrap_log_verbosity = 1;
#define CWRAP_MAGIC (0xDEADBABE)
typedef struct CWRAP_DATA {
int verbosity;
int magic;
int calls;
int line;
const char * name_cpp; int len_name_cpp;
const char * name ; int len_name ;
const char * result ; int len_result ;
const char * params ; int len_params ;
const char * file ; int len_file ;
} __attribute__((aligned(32),packed)) CWRAP_DATA;
EOF
my @demangled_names = `$cmd_cppfilt`; # e.g. int get_max<int>(int, int)
foreach my $i (0..$#mangled_names) {
my $source_line = $source_lines[$i];
my $mangled_name = $mangled_names[$i];
my $demangled_name = $demangled_names[$i];
chomp $demangled_name;
printf qq[ - %-48s %s\n], $mangled_name, $demangled_name;
die sprintf qq[ERROR: internal: 0 == source_line from linker undefined error: %s\n], $undefineds[$i] if (0 == $source_line);
my ($params ) = $demangled_name =~ m~\((.*)\)$~; # e.g. int get_max<int>(int, int)
my ($result ) = $demangled_name =~ m~^([^\s\(]+)\s+~;
my ($class_function_name) = $demangled_name =~ m~^$result\s*([^\(]+)~;
printf $out qq[CWRAP_DATA cwrap_data_%s = {1, CWRAP_MAGIC, 0, %d, "%s", %d, "%s", %d, "%s", %d, "%s", %d, "%s", %d};\n],
$mangled_name, $source_line, $mangled_name, length($mangled_name), $class_function_name, length($class_function_name), $result, length($result), $params, length($params), $source_files[$i], length($source_files[$i]);
}
printf $out <<EOF;
CWRAP_DATA cwrap_data_array_end = {-1, CWRAP_MAGIC, -1, -1, "", 0, "", 0, "", 0, "", 0, "", 0};
CWRAP_DATA * cwrap_data_array = &cwrap_data_$mangled_names[0];
void cwrap_data_sanity_check(CWRAP_DATA * cw) {
if (cw->magic != CWRAP_MAGIC) {
printf("ERROR: internal: cwrap bad data; exiting // cwrap_data_array=%%p cw=%%p diff=%%ld\\n", cwrap_data_array, cw, (char* ) cw - (char *) cwrap_data_array);
raise(SIGINT);
exit(1);
}
}
void cwrap_log_verbosity_set(int verbosity, const char * name) {
printf("%%.*s+ cwrap_log_verbosity_set(verbosity=%%d, name=%%s) {\\n", 2 * cwrap_log_indent, &cwrap_log_spaces[0], verbosity, name);
int i = 0;
while (cwrap_data_array[i].verbosity >= 0) {
cwrap_data_sanity_check(&cwrap_data_array[i]);
printf("%%.*s - i=%%d .verbosity=%%d .name=%%s\\n", 2 * cwrap_log_indent, &cwrap_log_spaces[0], i, cwrap_data_array[i].verbosity, &cwrap_data_array[i].name[0]);
if (0 == memcmp(name, cwrap_data_array[i].name, 3)) {
printf("%%.*s - setting verbosity to zero\\n", 2 * cwrap_log_indent, &cwrap_log_spaces[0]);
cwrap_data_array[i].verbosity = 0;
}
i ++;
}
printf("%%.*s } // cwrap_log_verbosity_set()\\n", 2 * cwrap_log_indent, &cwrap_log_spaces[0]);
}
void cwrap_log_stats(void) {
printf("+ cwrap_log_stats() {\\n");
int i = 0;
while (cwrap_data_array[i].verbosity >= 0) {
printf(" - %%d call(s) to %%s(%%s)\\n", cwrap_data_array[i].calls, &cwrap_data_array[i].name[0], &cwrap_data_array[i].params[0]);
i ++;
}
printf(" } // cwrap_log_stats()\\n");
}
void __cyg_profile_func_enter (void *this_fn, void *call_site) {
CWRAP_DATA * cw = this_fn;
cwrap_data_sanity_check(cw);
printf("%%.*s+ %%s() {\\n", 2 * cwrap_log_indent, &cwrap_log_spaces[0], cw->name);
cw->calls ++;
cwrap_log_indent ++;
}
void __cyg_profile_func_exit (void *this_fn, void *call_site) {
CWRAP_DATA * cw = this_fn;
cwrap_data_sanity_check(cw);
printf("%%.*s} // %%s() #%%d\\n", 2 * cwrap_log_indent, &cwrap_log_spaces[0], cw->name, cw->calls);
cwrap_log_indent --;
if (cwrap_log_indent < 0) {
printf("ERROR: internal: cwrap_log_indent < 0\\n");
raise(SIGINT);
exit(1);
}
}
EOF
close $out;
my $link_cmd = qq[gcc -g -o cpp-example cwrap.c cpp-example-1.o cpp-example-2.o -lstdc++ 2>&1];
# my $link_cmd = qq[gcc -g -o overhead cwrap.c overhead.o -lstdc++ 2>&1];
printf qq[- linking: %s\n], $link_cmd;
my @output = `$link_cmd`;
printf "@output";
exit;
}
sub pretty_asm {
my ($asm_line) = @_;
$asm_line =~ s~\t~ ~gs;
$asm_line =~ s~^\s+~~s;
$asm_line =~ s~\s+$~~s;
return $asm_line;
}
my $s_file = $ARGV[0];
printf qq[- s_file=%s\n], $s_file;
die sprintf qq[ERROR: file does not exist: %s\n], $s_file if (not -e $s_file);
my @s_file_lines = `cat $s_file`;
printf qq[- s_file_lines=%d\n], scalar @s_file_lines;
my $t1 = Time::HiRes::time();
my @s_file_line_index = indexes { m~(call|jmp)\s+__cyg_profile_func_(enter|exit)~ } @s_file_lines; # (call|jmp) __cyg_profile_func_exit
my $t2 = Time::HiRes::time();
printf qq[- found __cyg_profile_func_(enter|exit}() calls: %d in %fs\n], scalar @s_file_line_index, $t2 - $t1;
# leaq main(%rip), %rdi
# leaq inc_a(%rip), %rbp
# ...
# call __cyg_profile_func_enter@PLT
# ...
# movq %rbp, %rdi
# call __cyg_profile_func_enter@PLT <-- inline function #1 enter via rbp
# ...
# movq %rbp, %rdi
# call __cyg_profile_func_exit@PLT <-- inline function #1 leave via rbp
# ...
# leaq inc_b(%rip), %rdi
# call __cyg_profile_func_enter@PLT <-- inline function #2 enter
# ...
# leaq inc_b(%rip), %rdi
# call __cyg_profile_func_exit@PLT <-- inline function #2 enter
# ...
# leaq main(%rip), %rdi
# call __cyg_profile_func_exit@PLT
my $debug_search = 0;
foreach my $i (@s_file_line_index) {
my ($type) = $s_file_lines[$i -0] =~ m~__cyg_profile_func_(enter|exit)~;
printf qq[ - starting at line %d: %s\n], $i, pretty_asm($s_file_lines[$i -0]) if($debug_search);
die qq[ERROR: cannot determine enter or exit type!\n] if (not defined $type);
my $mangled_name;
my $mangled_name_line;
my $register = "\%rdi"; # rdi is Linux calling convention parameter #1; https://en.wikipedia.org/wiki/X86_calling_conventions#System_V_AMD64_ABI
my $lines_back = 1;
do {
if ($s_file_lines[$i -$lines_back] =~ m~leaq\s+cwrap_data_([_A-Za-z0-9]+).*,.*$register~) {
printf qq[ - found %s at %d lines back: %s <-- already instrumented!\n], $register, $lines_back, pretty_asm($s_file_lines[$i -$lines_back]) if($debug_search);
$mangled_name = $1; # e.g. leaq _ZN3FooD2Ev(%rip), %rdi
$mangled_name_line = -1;
goto FOUND_LEAQ;
}
elsif ($s_file_lines[$i -$lines_back] =~ m~leaq\s+([_A-Za-z0-9]+).*,.*$register~) {
printf qq[ - found %s at %d lines back: %s <-- instrumenting\n], $register, $lines_back, pretty_asm($s_file_lines[$i -$lines_back]) if($debug_search);
$mangled_name = $1; # e.g. leaq _ZN3FooD2Ev(%rip), %rdi
$mangled_name_line = $i - $lines_back;
goto FOUND_LEAQ;
}
elsif ($s_file_lines[$i -$lines_back] =~ m~movq\s+(\%[_A-Za-z0-9]+).*,.*$register~) {
printf qq[ - found %s at %d lines back: %s\n], $register, $lines_back, pretty_asm($s_file_lines[$i -$lines_back]) if($debug_search);
$register = $1;
}
$lines_back ++;
} while ($lines_back < 50);
die sprintf qq[ERROR: internal: cannot find leaq instruction in assemble file before line %d!\n], 1 + $i;
FOUND_LEAQ:;
printf qq[ - %-5s %s%s\n], $type, $mangled_name, ($mangled_name_line >= 0) ? "" : " (via rbp register!)";
if ($mangled_name_line >= 0) {
$s_file_lines[$mangled_name_line] =~ s~($mangled_name)~cwrap_data_$1~;
}
# use edx below because call __cyg_profile_func_(enter|exit) only uses 2 parameters, RDI & RSI, therefore use 3rd parameter RDX for comparison; see https://en.wikipedia.org/wiki/X86_calling_conventions#System_V_AMD64_ABI
my $skip_label = sprintf qq[.L_cwrap_skip_%s_%d], "cyg_profile_func", $i;
my $save_call = $s_file_lines[$i];
$s_file_lines[$i] = <<EOF;
movl cwrap_data_$mangled_name(%rip), %edx
cmpl cwrap_log_verbosity(%rip), %edx
jl $skip_label
EOF
$s_file_lines[$i] .= sprintf qq[%s], $save_call; # original call to __cyg_profile_func_(enter|exit)
$s_file_lines[$i] .= sprintf qq[%s:\n], $skip_label;
if($save_call =~ m~jmp~) {
$s_file_lines[$i] .= sprintf qq[\tret\n];
}
$s_file_lines[$i] .= sprintf qq[\n];
}
my $s_file_out = sprintf qq[%s.2.s], $s_file;
printf qq[- writing to: %s\n], $s_file_out;
open(my $out, '>', $s_file_out) || die sprintf qq[ERROR: cannot open file for writing: %s\n], $s_file_out;
foreach my $s_file_line (@s_file_lines) {
printf $out qq[%s], $s_file_line;
}
close $out;
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment