Skip to content

Instantly share code, notes, and snippets.

@uttampawar
Last active March 28, 2020 05:43
Show Gist options
  • Save uttampawar/5407f998bc3f02f58c4b83b0b4dc20fe to your computer and use it in GitHub Desktop.
Save uttampawar/5407f998bc3f02f58c4b83b0b4dc20fe to your computer and use it in GitHub Desktop.
llvm-propeller optimization on the included test program.
Hardware information:
OS: 4.15.0-58-generic (uname -r)
VERSION="18.04.3 LTS (Bionic Beaver)"
GCC: gcc version 7.4.0 (Ubuntu 7.4.0-1ubuntu1~18.04.1)
GNU ld (GNU Binutils for Ubuntu) 2.30
$ clang++ -O2 main.cc callee.cc -fpropeller-label -o a.out.labels -fuse-ld=lld
$ perf record -e cycles:u -j any,u -- ./a.out.labels 1000000000 2 >& /dev/null
$LLVM_DIR/llvm-propeller/create_llvm_prof --format=propeller --binary=./a.out.labels --profile=perf.data --out=perf.propeller
/home/upawar/projects/llvm-propeller/create_llvm_prof: /mnt/sdb1/upawar/tools/lib/libtinfo.so.6: no version information available (require
d by /home/upawar/projects/llvm-propeller/create_llvm_prof)
# I build my own version of ncurses-6.1 version library
$ ldd $LLVM_DIR/llvm-propeller/create_llvm_prof
/home/upawar/projects/llvm-propeller/create_llvm_prof: /mnt/sdb1/upawar/tools/lib/libtinfo.so.6: no version information available (require
d by /home/upawar/projects/llvm-propeller/create_llvm_prof)
linux-vdso.so.1 (0x00007fff4f9f7000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f73041d1000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f7303fb2000)
libtinfo.so.6 => /mnt/sdb1/upawar/tools/lib/libtinfo.so.6 (0x00007f7303d79000)
libelf.so.1 => /usr/lib/x86_64-linux-gnu/libelf.so.1 (0x00007f7303b5f000)
libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f7303942000)
libcrypto.so.1.1 => /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 (0x00007f7303476000)
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f73030ed000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f7302d4f000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f7302b37000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f7302746000)
/lib64/ld-linux-x86-64.so.2 (0x00007f7304ae5000)
# Save original binary to measure the difference
$ cp a.out.labels a.out.orig.labels
# Re-build new optimized binary with perf.data and propeller
$ clang++ -O2 -v main.cc callee.cc -fpropeller-optimize=perf.propeller -fuse-ld=lld -o a.out.labels
$ cat perf.propeller
@a.out.labels
Symbols
1 2b N_start
2 2 N_dl_relocate_static_pie
3 3d Nmain
4 14 3.1
5 9 3.2
6 20 N_GLOBAL__sub_I_main.cc
7 2f N_Z6calleeb
8 11 7.1
9 f 7.2
10 9 7.3
11 20 N_GLOBAL__sub_I_callee.cc
12 65 N__libc_csu_init
13 2 N__libc_csu_fini
Branches
4 4 76738
4 7 77648 C
7 9 77816
10 4 76755 R
Fallthroughs
4 4 151134
7 7 76515
9 10 76695
10 10 75387
!_Z6calleeb
!!2
!!3
!main
!!1
# Binary sizes
$ ls -l a.out.orig.labels a.out.labels
-rwxrwxr-x 1 upawar upawar 7744 Oct 24 13:43 a.out.orig.labels
-rwxrwxr-x 1 upawar upawar 7864 Oct 24 13:55 a.out.labels
# Perf data points
# Original binary
$ perf stat -e cycles,instructions,cache-misses,L1-icache-load-misses,br_misp_retired.all_branches,br_inst_retired.all_branches,icache_64b.iftag_stall,br_inst_retired.not_taken ./a.out.o
rig.labels 1000000000 1> /dev/null
Performance counter stats for './a.out.orig.labels 1000000000':
80,231,347,233 cycles (66.67%)
243,314,361,618 instructions # 3.03 insn per cycle (83.33%)
22,522 cache-misses (83.33%)
2,644,077 L1-icache-load-misses (83.33%)
20,400,061 br_misp_retired.all_branches (83.33%)
53,442,616,374 br_inst_retired.all_branches (83.34%)
68,554,744 icache_64b.iftag_stall (57.14%)
16,174,496,787 br_inst_retired.not_taken (49.99%)
21.191516400 seconds time elapsed
# Optimized binary
$ perf stat -e cycles,instructions,cache-misses,L1-icache-load-misses,br_misp_retired.all_branches,br_inst_retired.all_branches,icache_64b.iftag_stall,br_inst_retired.not_taken ./a.out.l
abels 1000000000 1> /dev/null
Performance counter stats for './a.out.labels 1000000000':
81,446,698,907 cycles (66.66%)
243,218,220,681 instructions # 2.99 insn per cycle (83.33%)
14,907 cache-misses (83.34%)
2,533,002 L1-icache-load-misses (83.34%)
20,571,010 br_misp_retired.all_branches (83.34%)
53,455,580,211 br_inst_retired.all_branches (83.33%)
68,847,492 icache_64b.iftag_stall (57.14%)
15,174,109,247 br_inst_retired.not_taken (49.98%)
21.512644234 seconds time elapsed
# Symbols from the binary
Function/symbol addresses:
# Original binary
$ nm --numeric-sort a.out.orig.labels
0000000000201920 T main
0000000000201940 t a.BB.main
0000000000201954 t aa.BB.main
0000000000201960 t _GLOBAL__sub_I_main.cc
0000000000201980 T _Z6calleeb
0000000000201986 t a.BB._Z6calleeb
0000000000201997 t aa.BB._Z6calleeb
00000000002019a6 t aaa.BB._Z6calleeb
00000000002019b0 t _GLOBAL__sub_I_callee.cc
# Optimized binary
$ nm --numeric-sort a.out.labels
00000000002018f0 T main
0000000000201920 T _Z6calleeb
000000000020193e t aa.BB.main
0000000000201947 t a.BB._Z6calleeb
# Optimized binary with "-Wl,--propeller-keep-named-symbols" flag,
$ nm --numeric-sort a.out.labels
00000000002018f0 T main
0000000000201908 t a.BB.main
0000000000201920 T _Z6calleeb
0000000000201926 t aa.BB._Z6calleeb
0000000000201935 t aaa.BB._Z6calleeb
000000000020193e t aa.BB.main
0000000000201947 t a.BB._Z6calleeb
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment