Russell Brown russelldb

CvRDTs are (almost?) as general as they can be

What are you talking about, and why should I care?

Now that we live in the Big Data, Web 3.14159 era, lots of people want to build databases that are too big to fit on a single machine. But there's a problem in the form of the CAP theorem, which states that if your network ever partitions (a machine goes down, or part of the network loses its connection to the rest) then you can keep consistency (all machines return the same answer to

riak-admin force-remove should not exist.

It's Friday evening, almost time to head out of the office for a nice long weekend. Your Riak cluster has been running along, everything fine. All of a sudden, the SSD in one of your Riak nodes decides to go up in a ball of flames. So you, being the good sysadmin that you are, immediately hop on the phone with your hardware vendor and order a new SSD. They tell you that you'll have it on Monday morning. Clearly you can't leave a broken node in your Riak environment, so you'll want to remove it from the cluster. You sit down at your terminal, hop on to a working Riak node and type

riak-admin force-remove [email protected]

NOOOOOOOOOOOOOOOOOOOOOOOOO!!!!

Here's where I understand the state of the art to be:

In this INRIA tech report, Shapiro, Preguiça, Baquero and Zawirski (SPBZ) prove, amongst other things, that a sufficient condition for CRDTs to achieve eventual consistency on networks which may reorder and duplicate packets (which I'll call flaky networks, henceforth) is that
1. the underlying datatype forms a semilattice,
2. messages are full states,
3. incoming messages are combined with the node's current state using the least-upper-bound operation in the semilattice.
It's possible to relax condition 2 and still achieve eventual consistency over flaky networks by fragmenting the state into independent parts and transmitting updates to each part separately. For instance, in the G-Set CRDT (an add-only bitset) one can transmit only the index of the element to be added.
In [these slides from a talk at Dagstuhl](http://www.dagstuhl.de/mat/Files/13/13081/13081.BaqueroCarlos.Sl

Put

API

Starting: https://github.com/basho/riak_kv/blob/1.4.2/src/riak_kv_wm_object.erl#L619
We create a new riak_object and populate the various fields with the headers, metadata supplied by the client.
Big suprise, we eventually call riak_client:put: https://github.com/basho/riak_kv/blob/1.4.2/src/riak_client.erl#L143
If/when the client returns any errors these are handled in handle_common_errors and it is nice to return human readable errors to client :)

Riak_client

API

HTTP

Entry point for all object operations: https://github.com/basho/riak_kv/blob/1.4.2/src/riak_kv_wm_object.erl

delete_resource/2 takes RequestData(Request header, ex: vclock) and Context(Record containing: Bucket, Key, Client): https://github.com/basho/riak_kv/blob/1.4.2/src/riak_kv_wm_object.erl#L888

I've had one question about subpar performance in Riak 2.0's CRDTs. I thought I'd write this so that people can more easily diagnose these issues, without the CRDT team having to step in every time.

An Example: A Client was having problems with the performance fetching and updating sets. The issue manifested itself with poor fetch performance.

So, how do you go about debugging/diagnosing this?

Installation of Erlang R16B03-1 and 17.0-rc2 to OS X 10.9.2 with Xcode and wxWidgets 3.0.0

Note: always set umask 022 for system-shared libraries
See http://blog.equanimity.nl/blog/2014/02/09/erlang-r17-rc1-on-osx-with-wx-and-a-working-observer/ for the details
wxWidgets 3.0.0 works the same in R16B03-1 and 17.0-rc2
Note well: wx:demo() on OS X 10.9.2 with wxWidgets 3.0.0 is still unstable, though observer:start() is more stable.
If you really don't have time, try Erlang Solutions' 32bit (not 64bit) distribution at https://www.erlang-solutions.com/downloads/download-erlang-otp to use it as a debugging console.
Update 28-FEB-2014 0230UTC: Leo Liu reports brew install wxmac --disable-monolithic will do. See http://erlang.org/pipermail/erlang-questions/2014-February/077952.html.

	-module(riak_metrics).
	-compile(export_all).

	main([NodeName0, Cookie, Length, Command]) ->
	LocalName = '[email protected]',
	NodeName = list_to_atom(NodeName0),
	case net_kernel:start([LocalName]) of
	{ok, _} ->
	erlang:set_cookie(node(), list_to_atom(Cookie)),
	case net_kernel:hidden_connect_node(NodeName) of

	-module(fj).

	-export([parallel/2]).

	%%%-----------------------------------------------------------------------------
	%%% @doc Executes the given function on every task in tasks in parallel.
	%%% @spec parallel(Function, Tasks) -> Results
	%%% where Results is a list matching the arity of input list Tasks
	%%% but contains the result of invoking Function on those tasks
	%%% @end