Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save binarytemple-external/77ea9ec978161151b3480fb81282c660 to your computer and use it in GitHub Desktop.
Save binarytemple-external/77ea9ec978161151b3480fb81282c660 to your computer and use it in GitHub Desktop.

API

###HTTP

Entry point for all object operations: https://github.com/basho/riak_kv/blob/1.4.2/src/riak_kv_wm_object.erl

delete_resource/2 takes RequestData(Request header, ex: vclock) and Context(Record containing: Bucket, Key, Client): https://github.com/basho/riak_kv/blob/1.4.2/src/riak_kv_wm_object.erl#L888

###PB

Magic! Not really but I have honestly never gone deep into this API code. However, this API eventually calls riak_client:delete in the same manner as HTTP.

The VNode

The API then calls either riak_client:delete or riak_client:delete_vclock depending on the precense of vector client in the DELETE request header. Both of these methods eventually call riak_kv_delete_sup:start_delete which spawns a riak_kv_delete worker to handle this request with the riak_kv_delete:delete function: https://github.com/basho/riak_kv/blob/1.4.2/src/riak_kv_delete.erl#L56

If a vclock was not defined in the request header, a riak_client:get is performed to extract the vclock from the returned RiakObject and then calls riak_kv_delete:delete with a vclock defined: https://github.com/basho/riak_kv/blob/1.4.2/src/riak_kv_delete.erl#L69

###Put the KV Tombstone

Within riak_kv_delete:delete a tombstone object is created and then it is riak_client:put into the DB: https://github.com/basho/riak_kv/blob/1.4.2/src/riak_kv_delete.erl#L90

The original Client is then sent back a response indicating a success or failure: https://github.com/basho/riak_kv/blob/1.4.2/src/riak_kv_delete.erl#L91.

It is possible that the request has already timed out so the response will be dropped. This timeout is specified by the client: https://github.com/basho/riak_kv/blob/1.4.2/src/riak_client.erl#L255

This ENDS the client facing operations of a delete!!! but...but...but...WE NEVER ACTUALLY DELETED ANYTHING!!!

A tombstone object will return a 404(with a vclock) when queried but there is an object in the backend for this key which means edge cases(repl, expiry, keylists, etc) can still show this key.

###Backend delete setup

To see the final step of a delete we need to go back to riak_kv_delete:delete: https://github.com/basho/riak_kv/blob/1.4.2/src/riak_kv_delete.erl#L93. We can see that after the response was sent back to the original client, we do another GET with a hard coded 60s timeout.

Tracing this to the riak_client:get function: https://github.com/basho/riak_kv/blob/1.4.2/src/riak_client.erl#L77 we can see that a riak_kv_get_fsm is spawned. We enter at the init after gen_fsm:start_link is called: https://github.com/basho/riak_kv/blob/1.4.2/src/riak_kv_get_fsm.erl#L147. We then call the prepare function: https://github.com/basho/riak_kv/blob/1.4.2/src/riak_kv_get_fsm.erl#L175 which does the consistent hash and identifies the preflist for this operation. This ends up calling validate: https://github.com/basho/riak_kv/blob/1.4.2/src/riak_kv_get_fsm.erl#L211 which confirms we have valid n,r,pr, and other request specific values. This then calls execute: https://github.com/basho/riak_kv/blob/1.4.2/src/riak_kv_get_fsm.erl#L272 which calls riak_kv_vnode:get with a preflist attached: https://github.com/basho/riak_kv/blob/1.4.2/src/riak_kv_get_fsm.erl#L280. This function eventually calls riak_core_vnode_master:command: https://github.com/basho/riak_kv/blob/1.4.2/src/riak_kv_vnode.erl#L169 which sends the GET request to all vnodes specified in the preflist with their responses going back to the Sender(the get FSM): https://github.com/basho/riak_core/blob/1.4.2/src/riak_core_vnode_master.erl#L80. This request is handled by the handle_command(?KV_GET_REQ function of the vnode: https://github.com/basho/riak_kv/blob/1.4.2/src/riak_kv_vnode.erl#L406. This calls do_get which eventually returns back to the original get_fsm who is waiting in waiting_vnode_r: https://github.com/basho/riak_kv/blob/1.4.2/src/riak_kv_get_fsm.erl#L295. If all partitions have returned and they all returned tomstones the riak_kv_get_core:final_action will return delete and the backend delete logic will be started.

This logic begins in the finalize steps of the riak_kv_get_fsm. IIF all vnodes in the preflist have responded: https://github.com/basho/riak_kv/blob/1.4.2/src/riak_kv_get_fsm.erl#L366 the finalize logic will be started. In this logic riak_kv_get_core:final_action will return delete IIF all the partitions that returned returned the exact same tombstone object. This then flows to maybe_delete which confirms that all the vnodes in the preflist are primaries: https://github.com/basho/riak_kv/blob/1.4.2/src/riak_kv_get_fsm.erl#L394. If not all partitions are primaries, we exit now and will retry the backend delete on a GET operation when all primaries are up for this preflist.

As the delete progresses we call riak_kv_vnode:del which sends a ?KV_DELETE_REQ to the riak_kv_vnode_master which will forward the request to vnodes in the preflist. The vnodes will call their handle command for the delete request: https://github.com/basho/riak_kv/blob/1.4.2/src/riak_kv_vnode.erl#L467 and this calls the do_delete function: https://github.com/basho/riak_kv/blob/1.4.2/src/riak_kv_vnode.erl#L1320

###Actual backend delete

do_delete first calls do_get_term which eventually calls Mod:get where Mod is the backend configured in riak_kv. If the object is returned it is checked to be a tombstone and then delete_mode is checked: https://github.com/basho/riak_kv/blob/1.4.2/src/riak_kv_vnode.erl#L1332. Depending on delete_mode, do_backend_delete is eventually called: https://github.com/basho/riak_kv/blob/1.4.2/src/riak_kv_vnode.erl#L932. This function simply calls Mod:delete which will actually reap the tombstone from the backend. Then, the object is removed from the index list(AAE) so it will not be ressurected(hopefully).

###Backend specific delete paths

The call to riak_kv_*_backend:delete trigers a call to the backends delete function. This path is dependant on the backend used. I'll go over all 3 backends.

#####Bitcask The bitcask backend delete simply puts a ?TOMBSTONE into the backend and then removes the entry from the keydir: https://github.com/basho/bitcask/blob/develop/src/bitcask.erl#L274

This tombstone macro is different than the riak_kv tombstone and is specific to bitcask. This entry is removed during merge to clear disk space.

#####LevelDB Leveldb similarly write a special [{delete, Key}] entry to the DB: https://github.com/basho/eleveldb/blob/master/src/eleveldb.erl#L151

This entry can trigger a low level compaction if the most recent value for the key is in the young level. The entry triggers old values in lower levels to be removed on compaction and this entry is only removed once all those entries are purged.

#####Memory The memory backend is entirely coded within riak_kv_memory_backend because it's a hack using ets tables. The delete logic is very straightforward: https://github.com/basho/riak_kv/blob/1.4.2/src/riak_kv_memory_backend.erl#L230

The logic simply removes this key from the main ets table via ets:delete and removes all associated indexes from the index table.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment