Skip to content

Instantly share code, notes, and snippets.

@mbbroberg
Forked from sdebnath/gist:36c235e042cb35db7d1f
Last active August 29, 2015 14:27
Show Gist options
  • Save mbbroberg/b94998721491c07ba30a to your computer and use it in GitHub Desktop.
Save mbbroberg/b94998721491c07ba30a to your computer and use it in GitHub Desktop.
Add field to Riak YZ Schema with CRDTs
This gist captures what needs to be done to add a new field to Riak's Yokozuna
search index.
Sources:
- https://github.com/basho/yokozuna/issues/130
- http://riak-users.197444.n3.nabble.com/How-to-update-existed-schema-td4032143.html
The code below is for illustration purposes only. Use at your own risk.
1. Create/Update new schema file
2. Upload schema to main node
cat schema/my_bucket.xml | curl -XPUT http://127.0.0.1:49001/search/schema/my_bucket -H 'Content-Type:application/xml' --data-binary @-
3. Reload YZ index on each node
a. individual rpc calls on each node:
rpc:block_call('[email protected]', yz_index, reload, [<<"my_bucket">>]).
rpc:block_call('[email protected]', yz_index, reload, [<<"my_bucket">>]).
rpc:block_call('[email protected]', yz_index, reload, [<<"my_bucket">>]).
b. via multicall
rpc:multicall(['[email protected]','[email protected]','[email protected]'], yz_index, reload, [<<"my_bucket">>]).
If all is well then you should get {ok, Nodes} where Nodes is the
list of nodes in your Riak cluster. If something goes wrong
you'll get {error, Errors} where Errors is a list of errors
for each node that had an error.
At this point any new data inserted is searchable. To get old data re-indexed
with new field definition, we need to read/write all keys in the bucket
18> {ok, Keys} = riakc_pb_socket:list_keys(Pid, {<<"my_bucket">>,<<"my_bucket">>}).
19> lists:foreach(fun(E) -> {ok, Post} =
{ok, M1} = riakc_pb_socket:fetch_type(Pid, {<<"my_bucket">>, <<"my_bucket">>}, E),
M2 = riakc_map:update({<<"some_field">>, set}, fun(S) -> riakc_set:add_element(<<"1">>, S), riakc_set:del_element(<<"1">>, S) end, M1),
riakc_pb_socket:update_type(Pid, {<<"my_bucket">>, <<"my_bucket">>}, E, riakc_map:to_op(M2)) end, Keys).
WARNING: the code above can wreck havoc on your cluster, esp. if you have gazillions
of keys. Think carefully. Unfortunately, this is the only way to achieve what we
need to do as of 06/26/2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment