Skip to content

Instantly share code, notes, and snippets.

@binarytemple
Last active August 29, 2015 14:22
Show Gist options
  • Save binarytemple/c517da3ed541f9b70f86 to your computer and use it in GitHub Desktop.
Save binarytemple/c517da3ed541f9b70f86 to your computer and use it in GitHub Desktop.

autoscale: true build-lists: true slidenumbers: true

Tracing Riak with Redbug

An example of how to use Redbug to trace Riak internals, taking as an example an inexplicably failing secondary index range query.


  • Enable the leveldb backend
  • Insert data
  • Perform 2i queries using the curl client
  • Perform the same queries using riak_client
  • Exact match succeeds, range query fails
  • Investigate!

Riak 1.4

%% Riak KV config
 {riak_kv, [
            %% Storage_backend specifies the Erlang module defining the storage
            %% mechanism that will be used on this node.
            %% {storage_backend, riak_kv_bitcask_backend},
            {storage_backend, riak_kv_eleveldb_backend},

Riak 2.x

storage_backend = eleveldb

Add indexed content..

curl -XPUT \
-H "x-riak-index-age_int: 25" \
-H "Content-Type: application/json" \
-d '{"name": "Anthony J Dacosta"}' \
http://127.0.0.1:10018/buckets/users/keys/anthony

Query using curl


Exact match

curl http://127.0.0.1:10018/buckets/users/index/age_int/0/25
{"keys":["anthony"]}% 

Range query

curl http://127.0.0.1:10018/buckets/users/index/age_int/0/25
{"keys":["anthony"]}

Lets attach to a running instance and attempt the same operations Using the Riak local client

riak attach

Firstly we instantiate a reference suitable for executing local riak calls by invoking "riak:local_client/0"

(dev1@127.0.0.1)1> {ok, C} = riak:local_client().
{ok,{riak_client,'[email protected]',undefined}}

riak_client uses parameterized modules to support this magic. Here's the module declaration in case you didn't believe me.

-module(riak_client, [Node,ClientId]).

As this feature (parameterized modules) has been removed from more recent Erlang releases, it's now supported at compile time by means of a parse transform.


We then attempt to execute an exact match secondary index query... The query executes correctly and returns the expected value.

(dev1@127.0.0.1)89> C:get_index( <<"users">>, 
                                {eq, <<"age_int">>, 25 },
                                [{r, 3}, {timeout, 60}]).
{ok,[<<"anthony">>]}

Lets try tracing the call using Redbug

(dev1@127.0.0.1)90> redbug:start("riak_client->return", 
                                [{time, 10000 }, {msgs,50}]).
ok
(dev1@127.0.0.1)91> C:get_index( <<"users">>, 
                                {eq, <<"age_int">> , 25 },
                                [{r, 3}, 
                                {timeout, 60}]).

A couple of things happen

10:23:36 <{erlang,apply,2}> {riak_client,get_index,
                                [<<"users">>,
                                 {eq,<<"age_int">>,25},
                                 [{r,3},{timeout,60}],
                                 {riak_client,'[email protected]',undefined}]}

10:23:36 <{erlang,apply,2}> {riak_client,mk_reqid,
                                [{riak_client,'[email protected]',undefined}]}

10:23:36 <{erlang,apply,2}> {riak_client,mk_reqid,1} -> 117674718

10:23:36 <{erlang,apply,2}> {riak_client,wait_for_query_results,
                                [117674718,60,
                                 {riak_client,'[email protected]',undefined}]}

...noise...

10:23:36 <{erlang,apply,2}> {riak_client,wait_for_query_results,4} -> {ok,
                                                                       [<<"anthony">>]}
{ok,[<<"anthony">>]}

...noise...

10:23:36 <{erlang,apply,2}> {riak_client,get_index,4} -> {ok,[<<"anthony">>]}
quitting: timeout

Lets try a range query. It fails with a timeout message.

(dev1@127.0.0.1)93> redbug:start("riak_client->return", [{time, 10000 }, {msgs,50}] ).
ok
(dev1@127.0.0.1)94> C:get_index( <<"users">>, {range, <<"age_int">> , [0,25] },[{r, 1}, {timeout, 60000}]).

10:28:34 <{erlang,apply,2}> {riak_client,get_index,
                                [<<"users">>,
                                 {range,<<"age_int">>,[0,25]},
                                 [{r,1},{timeout,60000}],
                                 {riak_client,'[email protected]',undefined}]}

10:28:34 <{erlang,apply,2}> {riak_client,mk_reqid,
                                [{riak_client,'[email protected]',undefined}]}

...noise...

10:28:34 <{erlang,apply,2}> {riak_client,wait_for_query_results,
                                [48701992,60000,[],
                                 {riak_client,'[email protected]',undefined}]}
quitting: timeout
{error,timeout}

Lets examine the tracing of the successful case

(dev1@127.0.0.1)95> redbug:start("riak_kv_index_fsm_sup:start_index_fsm->return", [{time, 10000 }, {msgs,1}] ).
ok
(dev1@127.0.0.1)96> C:get_index( <<"users">>, {eq, <<"age_int">> , 25 },[{r, 3}, {timeout, 60}]).

10:34:30 <{erlang,apply,2}> {riak_kv_index_fsm_sup,start_index_fsm,
                                ['[email protected]',
                                 [{raw,84775980,<0.31782.0>},
                                  [<<"users">>,none,
                                   {eq,<<"age_int">>,25},
                                   60,all,undefined]]]}
quitting: msg_count
{ok,[<<"anthony">>]}

Further investigation of the failure case

(dev1@127.0.0.1)105> redbug:start("riak_kv_index_fsm_sup:start_index_fsm->return", [{time, 10000 }, {msgs,1}] ).
ok
(dev1@127.0.0.1)106> C:get_index( <<"users">>, {range, <<"age_int">> , [0,25] },[{r, 1}, {timeout, 60000}]).

10:37:16 <{erlang,apply,2}> {riak_kv_index_fsm_sup,start_index_fsm,
                                ['[email protected]',
                                 [{raw,118057949,<0.31782.0>},
                                  [<<"users">>,none,
                                   {range,<<"age_int">>,[0,25]},
                                   60000,all,undefined]]]}
quitting: msg_count

A step closer to identifying the problem

riak_client:wait_for_query_results is never receiving a response, but why?


Lets investigate!

Firstly we examine the crash log file, and encounter the following stack trace:

** Reason for termination =
** {{badmatch,{error,{invalid_v1_query,{range,<<"age_int">>,[0,25]}}}},[{riak_index,upgrade_query,1,[{file,"src/riak_index.erl"},{line,371}]},{riak_k
v_eleveldb_backend,to_first_key,1,[{file,"src/riak_kv_eleveldb_backend.erl"},{line,869}]},{riak_kv_eleveldb_backend,fold_keys,4,[{file,"src/riak_kv_e
leveldb_backend.erl"},{line,377}]},{riak_kv_vnode,list,7,[{file,"src/riak_kv_vnode.erl"},{line,1393}]},{riak_kv_vnode,handle_coverage_fold,8,[{file,"
src/riak_kv_vnode.erl"},{line,856}]},{riak_core_vnode,vnode_coverage,4,[{file,"src/riak_core_vnode.erl"},{line,324}]},{gen_fsm,handle_msg,7,[{file,"g
en_fsm.erl"},{line,494}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,227}]}]}
2015-06-04 09:36:28 =ERROR REPORT====
** State machine <0.29718.0> terminating
** Last event in was {riak_coverage_req_v1,1004782375664995756265033322492444576013453623296,[{1370157784997721485815954530671515330927436759040,[137
0157784997721485815954530671515330927436759040]}],{fsm,{110969560,{1004782375664995756265033322492444576013453623296,'[email protected]'}},<0.20245.4>},
{riak_kv_index_req_v2,<<"users">>,none,{range,<<"age_int">>,[0,25]}}}
** When State == active

Lets dig further, using Redbug, and try to discover the cause of this 'invalid_v1_query' error:

(dev1@127.0.0.1)107> redbug:start("riak_index:make_query->return", 
                                  [{time, 10000 }, {msgs,1}] ).
ok
(dev1@127.0.0.1)108> C:get_index(<<"users">>, 
                                 {range, <<"age_int">> , [0,25] },
                                 [{r, 1},
                                 {timeout, 60000}]).

10:43:54 <<0.30392.5>> {riak_index,make_query,
                           [{range,<<"age_int">>,[0,25]},
                            {riak_kv_index_v3,<<>>,undefined,undefined,
                                undefined,true,true,true,false,undefined,
                                undefined}]}
quitting: msg_count

Examining the invocations of riak_index:

(dev1@127.0.0.1)116> redbug:start("riak_index->return", 
                                  [{time, 10000 }, 
                                  {msgs,100}] ).
ok
(dev1@127.0.0.1)117> C:get_index( <<"users">>, 
                                 {range, <<"age_int">> , [0,25] },
                                 [{r, 1}, 
                                 {timeout, 6000}]).

10:53:55 <<0.17963.6>> {riak_index,return_body,[{range,<<"age_int">>,[0,25]}]}

10:53:55 <<0.17963.6>> {riak_index,return_body,1} -> false

10:53:55 <<0.17963.6>> {riak_index,upgrade_query,
                                   [{range,<<"age_int">>,[0,25]}]}

10:53:55 <<0.17963.6>> {riak_index,make_query,
                           [{range,<<"age_int">>,[0,25]},
                            {riak_kv_index_v3,<<>>,undefined,undefined,
                                undefined,true,true,true,false,undefined,
                                undefined}]}

10:53:55 <<0.17963.6>> {riak_index,make_query,2} -> {error,
                                                     {invalid_v1_query,
                                                      {range,<<"age_int">>,
                                                       [0,25]}}}

10:53:55 <<0.17963.6>> {riak_index,upgrade_query,1} -> {error,
                                                        {badmatch,
                                                         {error,
                                                          {invalid_v1_query,
                                                           {range,
                                                            <<"age_int">>,
                                                            [0,25]}}}}}
...noise...
quitting: msg_count
{error,timeout}

And take a look at the implementation of riak_index:make_query/2:

make_query({range, Field, Start, End}, Q) ->
    {ok, Q?KV_INDEX_Q{filter_field=Field, start_term=Start,
                 end_term=End, return_terms=false}};
make_query(V1Q, _) ->

Aha! So the query format was incorrect. Lets try again, with the correct paramters.

(dev1@127.0.0.1)119> C:get_index( <<"users">>, 
                                  {range, <<"age_int">> , 0,25 },
                                  [{r, 1}, {timeout, 6000}]).
{ok,[<<"anthony">>]}

Solved!

Thanks Redbug!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment