behaviour_info(callbacks) ->
[
{api_version,0},
{capabilities, 1}, % (State)
{capabilities, 2}, % (Bucket, State)
{start,2}, % (Partition, Config)
{stop,1}, % (State)
{get,3}, % (Bucket, Key, State)
{put,5}, % (Bucket, Key, IndexSpecs, Val, State)
{delete,4}, % (Bucket, Key, IndexSpecs, State)
{drop,1}, % (State)
{fold_buckets,4}, % (FoldBucketsFun, Acc, Opts, State),
% FoldBucketsFun(Bucket, Acc)
{fold_keys,4}, % (FoldKeysFun, Acc, Opts, State),
% FoldKeysFun(Bucket, Key, Acc)
{fold_objects,4}, % (FoldObjectsFun, Acc, Opts, State),
% FoldObjectsFun(Bucket, Key, Object, Acc)
{is_empty,1}, % (State)
{status,1}, % (State)
{callback,3}]; % (Ref, Msg, State) ->
behaviour_info(_Other) ->
undefined.
First, look at this simple version of the 'yessir' back end.
-
The 'yessir' back end is only used for debugging and for performance analysis.
-
The 'yessir' back end does not store anything. It always lies.
-
The intent is to be very simple and fast.
Early version of riak_kv_yessir_backend.erl, June, 2012
- Being fast /= being short & simple code.
- Today's version of riak_kv_yessir_backend.erl is much more complicated.
- ... but it is faster and it has more features.
This code is more complicated than the 'yessir' backend. Why?
- It actually has a real memory.
- It actually supports Riak KV vnode fold operations
- fold_buckets, fold_keys, fold_objects
- syncronous fold, asyncronous fold
- The fold API looks very simple & easy to use...
- The fold API internal plumbing is ... not pretty.
- It supports Riak KV's 2I ("secondary indexes")
- It supports age limits ("time to live" or "TTL")
- It supports memory size accounting (instead of using ETS counters)
- It supports maximum memory limit:
- If limit is exceeded, then delete oldest object(s)
Here is the version of today's develop
branch of the 'memory' backend:
https://github.com/basho/riak_kv/blob/8ebfeb6700b451772e779c5f3c5601ca1ab4ec92/src/riak_kv_memory_backend.erl
- Very nice, pure, and easy:
lists:foldl(fun(_X, Count) -> Count + 1 end,
0,
[a, b, c, d, e, f, 42.0, "yo, kewl"]).
- The backend and vnode folding funcs are not pure and easy. (sadpanda)
- General pattern: filter
filter_wrapper_fun(Item, InnerFun, Acc) ->
case is_this_item_invisible(Item) of
true ->
Acc;
false ->
InnerFun(Item, Acc)
end.
- General pattern: transform
transform_wrapper_fun(Item, InnerFun, Acc) ->
RealItem = transform_somehow(Item),
InnerFun(RealItem, Acc).
- General pattern: transform from riak_kv_vnode.erl
handle_command(?FOLD_REQ{foldfun=FoldFun, acc0=Acc0,
forwardable=_Forwardable, opts=Opts}, Sender, State) ->
FoldWrapper = fun(Bucket, Key, Value, Acc) ->
FoldFun({Bucket, Key}, Value, Acc)
end,
Now, let's look at the 'memory' backend's fold functions: buckets, keys, objects, ...
-
A backend "folder" is a function that implements a backend's fold.
- ... Or like
lists:foldl()
implements a lists fold. - ... Just like
ets:fold()
implements the ETS fold.
- ... Or like
-
General folder pattern: stop the fold early
perhaps_stop_early_wrapper_fun(Item, InnerFun, Acc) ->
try
InnerFun(RealItem, Acc)
catch
stop_fold ->
Acc
end.
%%% ...
my_fold_function({Key, _Val}=BKey, InnerFun, Acc) ->
if
Key >= <<"n">> -> % Avoid keys >= "n..."
throw stop_fold;
true ->
InnerFun(BKey, Acc)
end.
Source code: https://github.com/basho/riak_kv/blob/develop/src/riak_kv_bitcask_backend.erl
This is the glue code between Riak KV and the Bitcask app.
Some additional things that the Bitcask backend does that the memory backend does not:
- Merge operations, to reclaim unused/"dead" disk space.
- Check for data upgrade/file format conversions
See also: the original Bitcask design summary, April 2010.
This design doc is still mostly correct ... many small details have changed in the last 4 years, though.
- Part 1: the "keydir" is stored in RAM always.
- When a bitcask is opened, the keydir is read from disk
->
RAM. - You must have enough RAM to store all keys.
- When a bitcask is opened, the keydir is read from disk
- Part 2: the data on disk.
- Key + value data is stored on disk
- The keydir is a key/value store
- key: same as the Bitcask key
- value: where on disk to find the Bitcask value
- file_id #, value size, value offset, timestamp
- The keydir is a hash table
- Key order is not preserved
- Key iteration is done via fold
- Severe limits on the # of simultaneous folds
- The keydir is implemented as a NIF
- NIF <-> C code
- Bitcask files are append-only.
- To delete or replace data, new records are appended to the active file.
- "Dead data" lives on disk for a period of time...
- ... until "merge" can copy "alive data" to a new data file.
- For each copied old key + value, update the RAM keydir
- When all "alive data" is copied, then delete the old data file.
- Only two files may be open at one time:
- The active file: new writes only
- (maybe) The merge file: old data copied from old files