Flamegraphing in Rust can now be done with a new cargo
subcommand. Please check this out before embarking on the legacy journey below:
https://github.com/flamegraph-rs/flamegraph
- Install
perf
, using Brendan Gregg's guide: http://www.brendangregg.com/perf.html#Prerequisites - Install
flamegraph
from repo:- Clone the repo locally:
git clone https://github.com/brendangregg/FlameGraph
- Add the main directory with all the
*.pl
Perl files to the path:
cd echo "PATH=/path/to/FlameGraph:$PATH" >> .profile source .profile
- Clone the repo locally:
- If you are running an older version of perf (any Linux kernel version before v4.8-rc1), you should also (this will resolve some further mangled names on top of the
c++filt
unmangling):- Clone
rust-unmangle
and add it to your path:
git clone https://github.com/Yamakaky/rust-unmangle.git
- Make
rust-unmangle
executable:
cd rust-unmangle chmod u+x rust-unmangle
- Add it to your path:
cd echo "PATH=/path/to/rust-unmangle:$PATH" >> .profile source .profile
- Clone
To turn on debugging information in the binary, to get actual function names in the flamegraph output, temporarily add to Cargo.toml
(you should remove this for an actual release):
[profile.release]
debug = true
Then compile with the --release
flag, to get cargo to optimize the resulting binary. Otherwise, any slowness may be due to a lack of compiler optimisations:
cargo build --release
Run the cpu sampling with:
perf record --call-graph dwarf,16384 -e cpu-clock -F 997 target/release/name-of-binary <command-line-arguments>
Options here are:
--call-graph dwarf,16384
:dwarf
ensures correct stack dumps, as the standard frame pointers gave me incorrect stacks; the,16384
doubles the stack dump size from the standard value, which has helped me avoid split stacks (I am assuming the smaller stack size did not suffice for deeper stacks and those stacks were cut off from the bottom, so blocks in the bottom were missing making a correct merging on those lower levels impossible. So try increasing this further in case you get weird split stacks with differences in the amount of lower levels.)-e cpu-clock
: this selects the eventcpu-clock
for perf sampling -- without it, the following argument did not really matter in my environment-F 997
: this ensures a sampling at 997 Hz, the value off from a round 1000 is to avoid lockstep sampling (see e.g. Brendan Gregg's blog post from 2014)
Should perf give you errors regarding sysctl
settings, you can inspect the current values with, e.g.:
sysctl -n kernel.perf_event_paranoid
And permanently write new values into them with:
sudo sysctl -w kernel.perf_event_paranoid=-1
The resulting report.perf
can be rather large, depending on the length of your example run and the sampling frequency selected, easily going into the GBs -- so make sure you have the space available. Based on the report, generate the flamegraph with:
perf script | stackcollapse-perf.pl | stackcollapse-recursive.pl | c++filt | rust-unmangle | flamegraph.pl > flame.svg
Tools here are:
stackcollapse-perf.pl
: The stackcollapse Perl script by Brendan Gregg, which groups identical levels in stacks together. For installation instructions see above.stackcollapse-recursive.pl
: This further collapses some recursive calls, improving readability. @tomtung reported this useful addition.c++filt
: This is aC++
demangler / unmangler that takes care of demangling a lot of the Rust name mangling, as Rust also usesC++
name mangling. It should be available in a standard linux installation.rust-unmangle
: This script unmangles some remaining names mangled by Rust. It is optional and should not be necessary in versions of perf from Linux kernel v4.8-rc1 onwards, as these should include rust unmangling code (and it worked without rust-unmangle for @tomtung) -- but I haven't tested the newer version myself and needed it for my older one, so who knows who else does, as well.flamegraph.pl
: This Perl script by Brendan Gregg takes the collapsed stacks and renders them into the (interactive).svg
format.
Inspect the flame.svg
file by opening it in a browser and hovering over individual bars to get the respective function names displayed. You can also search for bars containing certain expressions (top right), click on bars to zoom in on them and reset the view (top left).
Or inspect the non-collapsed report interactively by issuing:
perf report
But really, you want to rather look at the flamegraph... ;)
It is not possible to directly include .svg
files in GitHub issues and comments. However, I found a reasonable work-around:
- Create a new GitHub gist while logged in. You'll need to create a new gist for each image, but you can easily drag-and-drop the file in there.
- Use the link to the gist in your GitHub comments, advising users to follow that link and then properly inspect it by right-clicking on the preview and selecting
View Image
(tested in current Firefox).
For an example, have a look at this Pull Request: varlociraptor/varlociraptor#48
Many thanks to the people behind the following sources upon which I have built this little howto:
- http://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html
- https://carol-nichols.com/2015/12/09/rust-profiling-on-osx-cpu-time/
- https://blog.anp.lol/rust/2016/07/24/profiling-rust-perf-flamegraph/
- https://gist.github.com/KodrAus/97c92c07a90b1fdd6853654357fd557a
- https://www.reddit.com/r/rust/comments/4snw3k/linux_perf_gets_rust_symbol_demangling_support/
@tomtung, thanks again for your suggestions and comments over at:
llogiq/flame#33 (comment)
I've added them to the gist above.