karimnaaji/valgrind.md

Last active March 12, 2024 09:28

Star (4) You must be signed in to star a gist
Fork (0) You must be signed in to fork a gist

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/karimnaaji/fadcb18ef7b83e1a0d9586057f28ca62.js"></script>
Save karimnaaji/fadcb18ef7b83e1a0d9586057f28ca62 to your computer and use it in GitHub Desktop.

Download ZIP

Valgrind steps to get cache misses

Raw

valgrind.md

http://valgrind.org/docs/manual/cg-manual.html

valgrind --tool=cachegrind ./a.out
cg_annotate cachegrind.out.* --auto=yes

Instruction misses

Ir: number of instructions executed
I1mr: I1 cache read misses
ILmr: LL (last level) cache instruction read misses

Cache misses

Dr: number of memory reads
D1mr: D1 cache read misses
DLmr: LL (last level) cache data read misses
Dw: number of memory writes
D1mw: D1 cache write misses
DLmw: LL (last level) cache data write misses

On a modern machine, an L1 miss will typically cost around 10 cycles, an LL miss can cost as much as 200 cycles, and a mispredicted branch costs in the region of 10 to 30 cycles. Detailed cache and branch profiling can be very useful for understanding how your program interacts with the machine and thus how to make it faster.

Visualizing data

brew install qcachegrind --with-graphviz

valgrind --tool=callgrind --dump-instr=yes --collect-jumps=yes --simulate-cache=yes ./a.out

Profile with --dump-instr=yes to have more infos.

https://baptiste-wicht.com/posts/2011/09/profile-c-application-with-callgrind-kcachegrind.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment