Charlie Voiselle angrycub

Using KV Repair to Change a Cluster's n-Val or "Sometimes Three's a Crowd"

The shipping default for Riak is to build a cluster that stores three replicas of any value stored in it. While this is awesome for fault tolerance, in some cases where storage is tight you might need to loosen up and store fewer replicas.

"Nothing to it, I just go in and change the default n-val, right? Easy peasy."

Well, not so fast! Coverage queries are going to break and you're going to leave crufty, orphaned replicas in your datafiles, and, let's be honest, you're probably doing this to save space. So what to do?

If you've been using Riak for any time at all, you are probably familiar with the Rolling Upgrade procedure. Using this same pattern, we can proceed through the cluster and safely wipe the data from a node and then repopulate it from replica data. Using Riak KV Repair will enable us to only replicate the data onto the nodes that should be primary

riak-admin force-remove should not exist.

It's Friday evening, almost time to head out of the office for a nice long weekend. Your Riak cluster has been running along, everything fine. All of a sudden, the SSD in one of your Riak nodes decides to go up in a ball of flames. So you, being the good sysadmin that you are, immediately hop on the phone with your hardware vendor and order a new SSD. They tell you that you'll have it on Monday morning. Clearly you can't leave a broken node in your Riak environment, so you'll want to remove it from the cluster. You sit down at your terminal, hop on to a working Riak node and type

riak-admin force-remove [email protected]

NOOOOOOOOOOOOOOOOOOOOOOOOO!!!!

Summary

LevelDB can become corrupted when bad things happen on the filesystem or in hardware. We push the I/O to the limits on heavily loaded Riak nodes so it is not uncommon to experience such failures. This one exhibits as a message Compaction error: Corruption: corrupted compressed block contents in the «data_root»/leveldb/«vnode»/LOG file.

Diagnosis

Steps that pin-point this issue

[root@prod-2163 /var/db/riak/leveldb]# find . -name "LOG" -exec grep -l 'Compaction error' {} \; 
./442446784738847563128068650529343492278651453440/LOG

Using ADB to forward VNC

Plug in phone to USB

In command prompt run:

adb forward tcp:5801 tcp:5801
adb forward tcp:5901 tcp:5901

Make sure Droid VNC Server is started on the phone

Summary/tl;dr - Riak 1.0.0 has introduced more control over deletion with the delete_mode setting.

If you plan to delete and recreate objects under the same key rapidly, and there is enough disk available to store tombstones, it is safest to set delete_mode to keep.

The default three second delay for removing tombstones balances keeping the tombstone around long enough for any rapid delete/recreates, but unlike the keep mode it does remove the data.

Riak keeps your objects available during failures by storing multiple copies of the data. This redundancy makes deletion more complex than a single node database. For example, Riak needs to ensure deletes issued while nodes are down get applied when the nodes recover, or resolve what happens if the network is partitioned and an object is deleted on one side but updated on the other side.

Deletes in Riak are a two step process, first it writes a tombstone objects to the N replicas and only once all replicas have stored the tombstone are they remove

	#!/usr/bin/env escript -f
	%% -- erlang --

	-define(OFFSETFIELD, 64).
	-define(TSTAMPFIELD, 32).
	-define(KEYSIZEFIELD, 16).
	-define(TOTALSIZEFIELD, 32).

	-define(HINTHEADSIZE, 18). %%(?TSTAMPFIELD + ?KEYSIZEFIELD + ?TOTALSIZEFIELD + ?OFFSETFIELD)/8

	#!/usr/bin/env escript
	%% -- erlang --

	-include_lib("kernel/include/file.hrl").
	-compile(export_all).
	-define(LOG(S), io:format(S)).
	-define(LOG(S,A), io:format(S,A)).

	main(Dirs) ->
	CodePath = case os:getenv("RIAK_LIB") of

	package basho;
	import com.basho.riak.client.IRiakClient;
	import com.basho.riak.client.IRiakObject;
	import com.basho.riak.client.RiakFactory;
	import com.basho.riak.client.bucket.Bucket;
	import com.basho.riak.client.convert.RiakKey;

	public class App
	{
	public static class SimpleObject

	PBStatusFun = fun() ->
	VnodePids = [Pid \|\| {_, Pid} <- riak_core_vnode_manager:all_index_pid(riak_kv_vnode)],
	Links = [process_info(Pid, [links]) \|\| Pid <- VnodePids],
	WorkerPoolPids = [WPPid \|\| [{links,[_, WPPid]}] <- Links],
	WorkerPoolLinks = [process_info(Pid, [links]) \|\| Pid <- WorkerPoolPids],
	PoolboyPids = [PoolboyPid \|\| [{links,[_, PoolboyPid]}] <- WorkerPoolLinks],
	[poolboy:status(Pid) \|\| Pid <- PoolboyPids]
	end.

	PBStatusFun = fun(Index) ->

	% Purpose: I use this pre-commit hook to mark objects in a bucket as "dirty" with secondary indexing.
	% I then use a script to scrape out all dirty objects, do some processing, then save them with
	% "dirty_bin = false" as an index and the pre-commit hook erases the "dirty_bin" index.
	% So in essence it works as: `if dirty_bin = false; del dirty_bin; else dirty_bin = true; end`
	%
	% To install this pre-commit hook (just like any Riak pre-commit hook in Erlang), you need to create an Erlang file and
	% put it in your "basho-patches" directory. For me, on Ubuntu, this was "/usr/lib/riak/lib/basho-patches".
	% Once there, you need to compile it to a .beam file. This was helped by using the Riak provided erlc compiler,
	% which, on my Ubuntu system, was at "/usr/lib/riak/erts-5.8.5/bin/erlc"
	%