Erlang Tracing: more than you wanted to know

Rough Outline

What can be traced?
How can trace events be specified?
"match specifications": twisty passages, all alike
WTF, can I just use DTrace and drink my coffee/beer/whisky in peace?
Trace delivery mechanisms: pick one of two
Trace event receivers: what can they do?
Redbug vs. The World
Examples of event receivers

Reference docs

See docs: dbg:p/2 or erlang:trace/3
- http://erlang.org/doc/man/dbg.html
- http://erlang.org/doc/man/erlang.html#trace-3
They're somewhat interchangable.
Use whatever one you're comfortable with API-wise
They are twiddling the same knobs, inside the VM
I talk a lot about the RedBug app:
- https://github.com/massemanet/eper

What can be traced?

Which processes?
- A specific pid.
- existing: All processes currently existing
- new: All processes that will be created in the future
- all: existing and new
What events?
- function calls: call, return
- sending of messages
- receiving of messages
- process-related events: spawn, exit, register, unregister, link, unlink, getting linked, getting unlinked
- scheduling of processes: in, out
- process exiting
- garbage collection activities
Additional bells & whistles
- microsecond timestamps on all messages
- report arity # instead of arg list
- set on spawn / first spawn / link / first link

"match specifications": twisty passages, all alike

See docs in ets:fun2ms/1 ms_transform:parse_transform/2
- http://erlang.org/doc/man/ets.html
- http://erlang.org/doc/man/ms_transform.html
Because regular expressions on Erlang data sucks.
Match specs attempt to be Erlang'ish
- And try to be quick to execute
- But are obtuse to the 7th degree....
Example match specs:
- [{'_',[],[{message,{return_trace}}]}]
- [{{'$1','$2'},[{'>','$2',{const,3}},{is_atom,'$1'}],['$1']}]
  - ets:fun2ms(fun(A, B) when is_atom(A), B > 3 -> A end).
Mostly useful for restricting tracing on function calls
- After-the-fact tracing filters are good but...
- ... generating exactly the traces that you want is better.
- E.g., match foo:bar/3 but only if 1st arg is hellooooooo_world.

Trace delivery mechanisms: pick one of two

Erlang process receiver
- Send the trace events to any pid in the cluster
- Easy to use
TCP receiver
- Send binary trace events to a TCP port listener
- Faster & lower overhead: less likely to kill VM in trace event tsunami
IMPORTANT: All tracing events must go to the same single receiver.
- DTrace is far more flexible, alas.

WTF, can I just use DTrace and drink my coffee/beer/whisky in peace?

I sooooo wish.......
... unfortunate reality hits ....
DTrace isn't available everywhere that Erlang is
The VM's tracing infrastructure isn't a good match with DTrace, yet.
- Pulls requests are welcome!

DANGER: Erlang tracing can crash your VM!

It is quite possible to create a tsunami of trace events.

You can easily perturb performance with trace event receiver @ 100% CPU
You can easily run your machine out of RAM
You can deadlock the tracer and/or other processes.
Some trace patterns can disable tail call recursion ... run your VM out of memory even when the # of trace events is very small.

To be safe, do all of these things:

Make your trace specifications as specific as possible
- Do not use "'_':'_'"
- I.e.: Any module + any function name
- If you must use "your_module_name:'_'", run for a very short time, until you are confident that you will not create an event tsunami.
Simplify your trace receiver as much as possible
Limit the amount of time that tracing is enabled
Read the docs!
Redbug has all of these limits, that's one reason we use/recommend it.

Trace event receivers: what can they do?

Anything, really
- The events are just data. Fold/spindle/mutilate/....
In practice, simple is better
Most tracers just format the event, in the order received
- Simple, stateless
- For example, this is what Redbug does by default
  - But Redbug can use custom event handler, if you wish....
Redbug vs. The World
Redbug's default trace event receiver is stateless, prints to console
Redbug has option to write binary events to a file for offline processing
Redbug can use multiple match specs at "full strength & power"
So ... when should you not use Redbug?
- When online & stateful event receiver is needed

Examples of event receivers

Redbug specification:

> redbug:start("erlang:now -> return").
ok
21:31:52 <timer_server> {erlang,now,[]}
21:31:52 <timer_server> {erlang,now,0} -> {1385,555512,322417}
21:31:53 <timer_server> {erlang,now,[]}
21:31:53 <timer_server> {erlang,now,0} -> {1385,555513,322384}
[...]
quitting: msg_count

To limit the formatting depth (see all of the "..." for omitted data)

([email protected])26> redbug:start("timer:'_' -> return", {print_depth,7}).
ok
21:33:41 <{erlang,apply,2}> {timer,tc,[riak_pipe_stat,produce_stats,[]]}
21:33:41 <<0.3234.1>> {timer,now_diff,[{1385,555621,...},{1385,...}]}
21:33:41 <<0.3234.1>> {timer,now_diff,2} -> 130
21:33:41 <<0.3234.1>> {timer,tc,3} -> {130,
                                       {riak_pipe,[{{...},...},{...}|...]}}

For Redbug help, run redbug:help(). For explanation of Redbug print_depth, see the ~P formatting description in the io man page.

WARNING: Cut-and-paste code style

These are not great examples of Erlang code style
Close your eyes if you are sensitive to ugly code. ^_^

fsm_init_tracer.erl

Source: Scott Fritchie: https://gist.github.com/slfritchie/007b399675fb76f2d4f0

Report every Interval seconds the count of init() function calls for the following Riak KV FSM modules: riak_kv_buckets_fsm, riak_kv_exchange_fsm, riak_kv_get_fsm, riak_kv_index_fsm, riak_kv_keys_fsm, riak_kv_put_fsm

timeit.erl

Source: Greg Burd: https://gist.github.com/2656289

%% @doc Dynamically add timing to MFA.  There are various types of
%% timing.
%%
%% {all, Max} - time latency of all calls to MFA
%%
%% {sample, N, Max} - sample every N calls and stop sampling after Max
%%
%% {threshold, Millis, Max} - count # of calls where latency is > Millis
%% and count # of calls total, thus percentage of calls over threshold

Sample output:

> timeit:timeit(timer, tc, 3, {sample, 10, 5}). 
{ok,[{matched,'[email protected]',1},{saved,1}]}
[sample 1/10] <0.10771.1>:timer:tc: 0.152 ms
[sample 2/20] <0.10796.1>:timer:tc: 0.484 ms
[sample 3/30] <0.10825.1>:timer:tc: 0.111 ms
[sample 4/40] <0.10849.1>:timer:tc: 0.101 ms
[sample 5/50] <0.10875.1>:timer:tc: 194.682 ms

fsm_latency_tracer.erl

Source: Scott Fritchie: https://gist.github.com/slfritchie/7a87269148280e4724ce

Report each riak_kv_put_fsm and riak_kv_get_fsm process that executes in more than LatencyMS milliseconds. Also, for various strategic API functions in the file, bitcask, and eleveldb modules, report latencies above the same limit.

backend_latency_tracer.erl

Source: Scott Fritchie: https://gist.github.com/slfritchie/e7c3ad866f67d3fc8935

A more sophisticated version of fsm_latency_tracer. Watches more API functions in the bitcask, eleveldb, file, and riak_kv_fs2_backend modules. Uses the folsom library to generate histograms of latencies of the file:pread() function to show outliers and multi-modal patterns that simple averages & maximums can show.

accumulating_time_tracer.erl

Source: Scott Fritchie: https://gist.github.com/slfritchie/c01690d9c30e5e7b0100

Demonstrates how much of every second that the file_server_2 process is busy. It shows pretty dramatically how Bitcask's highest latencies are frequently caused by serialization by file_server_2.

trace_large4.erl

Source: Joe Blomstedt: https://gist.github.com/jtuple/244c578773962077e215

For calls to riak_kv_get_fsm:calculate_objsize/2, print the Bucket, Key, and object size.
For calls to riak_kv_eleveldb_backend:put/5 and riak_kv_bitcask_backend:put/5 and riak_kv_memory_backend:put/5, report if the object size is larger than Size.
Stop after Count events have been recorded or Time milliseconds have elapsed.

read_bin_trace_file.erl

Source: Scott Fritchie: https://gist.github.com/slfritchie/23121803af63d76eeca5

A small example of reading a binary-formatted Erlang trace file and doing something simple with it.

Note that Redbug can write its trace events into a binary file. This technique can be used for off-line processing, e.g., to avoid high processing cost in a production environment.

func_args_tracer.erl and latency_histogram_tracerl.erl

Source: Scott Fritchie: https://gist.github.com/slfritchie/159a8ce1f49fc03c77c6

func_args_tracer.erl

%% For example: what ETS tables are being called the most by ets:lookup/2?
%% The 1st arg of ets:lookup/2 is the table name.
%% Watch for 10 seconds.
%%
%% > func_args_tracer:start(ets, lookup, 2, 10, fun(Args) -> hd(Args) end).
%% 
%% Tracer pid: <0.16102.15>, use func_args_tracer:stop() to stop
%% Otherwise, tracing stops in 10 seconds
%% Current date & time: {2013,9,19} {18,5,48}
%% {started,<0.16102.15>}
%% Total calls: 373476 
%% Call stats:         
%% [{folsom_histograms,114065},
%%  {ac_tab,69689},
%%  {ets_riak_core_ring_manager,67147},
%%  {folsom_spirals,57076},
%%  {riak_capability_ets,48862},
%%  {riak_core_node_watcher,8149},
%%  {riak_api_pb_registrations,8144},
%%  {folsom,243},
%%  {folsom_meters,43},
%%  {folsom_durations,20},
%%  {timer_tab,18},
%%  {folsom_gauges,8},
%%  {riak_core_stat_cache,5},
%%  {sys_dist,3},
%%  {inet_db,1},
%%  {21495958,1},
%%  {3145765,1},
%%  {3407910,1}]

latency_histogram_tracerl.erl

%% For example: create a histogram of call latencies for bitcask:put/3.
%% Watch for 10 seconds.
%%
%% > latency_histogram_tracer:start(bitcask, put, 3, 10).
%% 
%% Tracer pid: <0.2108.18>, use latency_histogram_tracer:stop() to stop
%% Otherwise, tracing stops in 10 seconds
%% Current date & time: {2013,9,19} {18,14,13}
%% {started,<0.2108.18>}
%% Histogram stats:     
%% [{min,0},
%%  {max,48},
%%  {arithmetic_mean,2.765411819271055},
%%  [...]
%%  {percentile,[{50,3},{75,4},{90,5},{95,6},{99,8},{999,14}]},
%%  {histogram,[{1,13436},
%%              {2,12304},
%%              {3,10789},
%%              {4,7397},
%%              {5,4191},
%%              {6,1929},
%%              [...]
%%              {30,0},
%%              {31,0},
%%              {40,2},
%%              {50,1}]},
%%  {n,51746}]

ktrace.erl and ktrace_fold.erl

Source: Joe Blomstedt: https://gist.github.com/jtuple/161ad7c8c278b45729df Source: Jordan West: https://gist.github.com/jrwest/2999f8f217f1cbceca83

Attempts to capture enough data from Erlang tracing to construct a flamegraph.
- http://dtrace.org/blogs/brendan/2011/12/16/flame-graphs/
Aside/see also: https://github.com/proger/eflame
On VM #1: ktrace:c()
On VM #2, on same machine, running Riak: ktrace:s()
Then get Riak to do some get requests.
ktrace output should appear in VM #1's console. Save it to a file.
Run ktrace_fold:go("/path/to/input-file", "/path/to/output-file").
Run flamegraph.pl /path/to/output-file > "/path/to/output.svg

For example: http://www.snookles.com/scotttmp/vnode-get.svg ... which suggests that only about 15% of the wall-clock time for riak_kv_vnode get processing is due to the backend (Bitcask in this case).

yongboy/presentation.md