autoscale: true build-lists: true slidenumbers: true
An example of how to use Redbug to trace Riak internals, taking as an example an inexplicably failing secondary index range query.
- Enable the leveldb backend
- Insert data
- Perform 2i queries using the curl client
- Perform the same queries using
riak_client
- Exact match succeeds, range query fails
- Investigate!
Riak 1.4
%% Riak KV config
{riak_kv, [
%% Storage_backend specifies the Erlang module defining the storage
%% mechanism that will be used on this node.
%% {storage_backend, riak_kv_bitcask_backend},
{storage_backend, riak_kv_eleveldb_backend},
Riak 2.x
storage_backend = eleveldb
Add indexed content..
curl -XPUT \
-H "x-riak-index-age_int: 25" \
-H "Content-Type: application/json" \
-d '{"name": "Anthony J Dacosta"}' \
http://127.0.0.1:10018/buckets/users/keys/anthony
Query using curl
Exact match
curl http://127.0.0.1:10018/buckets/users/index/age_int/0/25
{"keys":["anthony"]}%
Range query
curl http://127.0.0.1:10018/buckets/users/index/age_int/0/25
{"keys":["anthony"]}
Lets attach to a running instance and attempt the same operations Using the Riak local client
riak attach
Firstly we instantiate a reference suitable for executing local riak calls by invoking "riak:local_client/0"
(dev1@127.0.0.1)1> {ok, C} = riak:local_client().
{ok,{riak_client,'[email protected]',undefined}}
riak_client
uses parameterized modules to support this magic.
Here's the module declaration in case you didn't believe me.
-module(riak_client, [Node,ClientId]).
As this feature (parameterized modules) has been removed from more recent Erlang releases, it's now supported at compile time by means of a parse transform.
We then attempt to execute an exact match secondary index query... The query executes correctly and returns the expected value.
(dev1@127.0.0.1)89> C:get_index( <<"users">>,
{eq, <<"age_int">>, 25 },
[{r, 3}, {timeout, 60}]).
{ok,[<<"anthony">>]}
Lets try tracing the call using Redbug
(dev1@127.0.0.1)90> redbug:start("riak_client->return",
[{time, 10000 }, {msgs,50}]).
ok
(dev1@127.0.0.1)91> C:get_index( <<"users">>,
{eq, <<"age_int">> , 25 },
[{r, 3},
{timeout, 60}]).
A couple of things happen
10:23:36 <{erlang,apply,2}> {riak_client,get_index,
[<<"users">>,
{eq,<<"age_int">>,25},
[{r,3},{timeout,60}],
{riak_client,'[email protected]',undefined}]}
10:23:36 <{erlang,apply,2}> {riak_client,mk_reqid,
[{riak_client,'[email protected]',undefined}]}
10:23:36 <{erlang,apply,2}> {riak_client,mk_reqid,1} -> 117674718
10:23:36 <{erlang,apply,2}> {riak_client,wait_for_query_results,
[117674718,60,
{riak_client,'[email protected]',undefined}]}
...noise...
10:23:36 <{erlang,apply,2}> {riak_client,wait_for_query_results,4} -> {ok,
[<<"anthony">>]}
{ok,[<<"anthony">>]}
...noise...
10:23:36 <{erlang,apply,2}> {riak_client,get_index,4} -> {ok,[<<"anthony">>]}
quitting: timeout
Lets try a range query. It fails with a timeout message.
(dev1@127.0.0.1)93> redbug:start("riak_client->return", [{time, 10000 }, {msgs,50}] ).
ok
(dev1@127.0.0.1)94> C:get_index( <<"users">>, {range, <<"age_int">> , [0,25] },[{r, 1}, {timeout, 60000}]).
10:28:34 <{erlang,apply,2}> {riak_client,get_index,
[<<"users">>,
{range,<<"age_int">>,[0,25]},
[{r,1},{timeout,60000}],
{riak_client,'[email protected]',undefined}]}
10:28:34 <{erlang,apply,2}> {riak_client,mk_reqid,
[{riak_client,'[email protected]',undefined}]}
...noise...
10:28:34 <{erlang,apply,2}> {riak_client,wait_for_query_results,
[48701992,60000,[],
{riak_client,'[email protected]',undefined}]}
quitting: timeout
{error,timeout}
Lets examine the tracing of the successful case
(dev1@127.0.0.1)95> redbug:start("riak_kv_index_fsm_sup:start_index_fsm->return", [{time, 10000 }, {msgs,1}] ).
ok
(dev1@127.0.0.1)96> C:get_index( <<"users">>, {eq, <<"age_int">> , 25 },[{r, 3}, {timeout, 60}]).
10:34:30 <{erlang,apply,2}> {riak_kv_index_fsm_sup,start_index_fsm,
['[email protected]',
[{raw,84775980,<0.31782.0>},
[<<"users">>,none,
{eq,<<"age_int">>,25},
60,all,undefined]]]}
quitting: msg_count
{ok,[<<"anthony">>]}
Further investigation of the failure case
(dev1@127.0.0.1)105> redbug:start("riak_kv_index_fsm_sup:start_index_fsm->return", [{time, 10000 }, {msgs,1}] ).
ok
(dev1@127.0.0.1)106> C:get_index( <<"users">>, {range, <<"age_int">> , [0,25] },[{r, 1}, {timeout, 60000}]).
10:37:16 <{erlang,apply,2}> {riak_kv_index_fsm_sup,start_index_fsm,
['[email protected]',
[{raw,118057949,<0.31782.0>},
[<<"users">>,none,
{range,<<"age_int">>,[0,25]},
60000,all,undefined]]]}
quitting: msg_count
riak_client:wait_for_query_results
is never receiving a response, but why?
Firstly we examine the crash log file, and encounter the following stack trace:
** Reason for termination =
** {{badmatch,{error,{invalid_v1_query,{range,<<"age_int">>,[0,25]}}}},[{riak_index,upgrade_query,1,[{file,"src/riak_index.erl"},{line,371}]},{riak_k
v_eleveldb_backend,to_first_key,1,[{file,"src/riak_kv_eleveldb_backend.erl"},{line,869}]},{riak_kv_eleveldb_backend,fold_keys,4,[{file,"src/riak_kv_e
leveldb_backend.erl"},{line,377}]},{riak_kv_vnode,list,7,[{file,"src/riak_kv_vnode.erl"},{line,1393}]},{riak_kv_vnode,handle_coverage_fold,8,[{file,"
src/riak_kv_vnode.erl"},{line,856}]},{riak_core_vnode,vnode_coverage,4,[{file,"src/riak_core_vnode.erl"},{line,324}]},{gen_fsm,handle_msg,7,[{file,"g
en_fsm.erl"},{line,494}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,227}]}]}
2015-06-04 09:36:28 =ERROR REPORT====
** State machine <0.29718.0> terminating
** Last event in was {riak_coverage_req_v1,1004782375664995756265033322492444576013453623296,[{1370157784997721485815954530671515330927436759040,[137
0157784997721485815954530671515330927436759040]}],{fsm,{110969560,{1004782375664995756265033322492444576013453623296,'[email protected]'}},<0.20245.4>},
{riak_kv_index_req_v2,<<"users">>,none,{range,<<"age_int">>,[0,25]}}}
** When State == active
Lets dig further, using Redbug, and try to discover the cause of this 'invalid_v1_query' error:
(dev1@127.0.0.1)107> redbug:start("riak_index:make_query->return",
[{time, 10000 }, {msgs,1}] ).
ok
(dev1@127.0.0.1)108> C:get_index(<<"users">>,
{range, <<"age_int">> , [0,25] },
[{r, 1},
{timeout, 60000}]).
10:43:54 <<0.30392.5>> {riak_index,make_query,
[{range,<<"age_int">>,[0,25]},
{riak_kv_index_v3,<<>>,undefined,undefined,
undefined,true,true,true,false,undefined,
undefined}]}
quitting: msg_count
Examining the invocations of riak_index
:
(dev1@127.0.0.1)116> redbug:start("riak_index->return",
[{time, 10000 },
{msgs,100}] ).
ok
(dev1@127.0.0.1)117> C:get_index( <<"users">>,
{range, <<"age_int">> , [0,25] },
[{r, 1},
{timeout, 6000}]).
10:53:55 <<0.17963.6>> {riak_index,return_body,[{range,<<"age_int">>,[0,25]}]}
10:53:55 <<0.17963.6>> {riak_index,return_body,1} -> false
10:53:55 <<0.17963.6>> {riak_index,upgrade_query,
[{range,<<"age_int">>,[0,25]}]}
10:53:55 <<0.17963.6>> {riak_index,make_query,
[{range,<<"age_int">>,[0,25]},
{riak_kv_index_v3,<<>>,undefined,undefined,
undefined,true,true,true,false,undefined,
undefined}]}
10:53:55 <<0.17963.6>> {riak_index,make_query,2} -> {error,
{invalid_v1_query,
{range,<<"age_int">>,
[0,25]}}}
10:53:55 <<0.17963.6>> {riak_index,upgrade_query,1} -> {error,
{badmatch,
{error,
{invalid_v1_query,
{range,
<<"age_int">>,
[0,25]}}}}}
...noise...
quitting: msg_count
{error,timeout}
And take a look at the implementation of riak_index:make_query/2
:
make_query({range, Field, Start, End}, Q) ->
{ok, Q?KV_INDEX_Q{filter_field=Field, start_term=Start,
end_term=End, return_terms=false}};
make_query(V1Q, _) ->
Aha! So the query format was incorrect. Lets try again, with the correct paramters.
(dev1@127.0.0.1)119> C:get_index( <<"users">>,
{range, <<"age_int">> , 0,25 },
[{r, 1}, {timeout, 6000}]).
{ok,[<<"anthony">>]}
Thanks Redbug!