###HTTP
Entry point for all object operations: https://github.com/basho/riak_kv/blob/1.4.2/src/riak_kv_wm_object.erl
delete_resource/2 takes RequestData(Request header, ex: vclock) and Context(Record containing: Bucket, Key, Client): https://github.com/basho/riak_kv/blob/1.4.2/src/riak_kv_wm_object.erl#L888
###PB
Magic! Not really but I have honestly never gone deep into this API code. However, this API eventually calls riak_client:delete
in the same manner as HTTP.
The API then calls either riak_client:delete
or riak_client:delete_vclock
depending on the precense of vector client in the DELETE request header. Both of these methods eventually call riak_kv_delete_sup:start_delete
which spawns a riak_kv_delete
worker to handle this request with the riak_kv_delete:delete
function: https://github.com/basho/riak_kv/blob/1.4.2/src/riak_kv_delete.erl#L56
If a vclock was not defined in the request header, a riak_client:get
is performed to extract the vclock from the returned RiakObject and then calls riak_kv_delete:delete
with a vclock defined: https://github.com/basho/riak_kv/blob/1.4.2/src/riak_kv_delete.erl#L69
###Put the KV Tombstone
Within riak_kv_delete:delete
a tombstone object is created and then it is riak_client:put
into the DB: https://github.com/basho/riak_kv/blob/1.4.2/src/riak_kv_delete.erl#L90
The original Client is then sent back a response indicating a success or failure: https://github.com/basho/riak_kv/blob/1.4.2/src/riak_kv_delete.erl#L91.
It is possible that the request has already timed out so the response will be dropped. This timeout is specified by the client: https://github.com/basho/riak_kv/blob/1.4.2/src/riak_client.erl#L255
This ENDS the client facing operations of a delete!!! but...but...but...WE NEVER ACTUALLY DELETED ANYTHING!!!
A tombstone object will return a 404(with a vclock) when queried but there is an object in the backend for this key which means edge cases(repl, expiry, keylists, etc) can still show this key.
###Backend delete setup
To see the final step of a delete we need to go back to riak_kv_delete:delete
: https://github.com/basho/riak_kv/blob/1.4.2/src/riak_kv_delete.erl#L93. We can see that after the response was sent back to the original client, we do another GET with a hard coded 60s timeout.
Tracing this to the riak_client:get
function: https://github.com/basho/riak_kv/blob/1.4.2/src/riak_client.erl#L77 we can see that a riak_kv_get_fsm
is spawned. We enter at the init
after gen_fsm:start_link
is called: https://github.com/basho/riak_kv/blob/1.4.2/src/riak_kv_get_fsm.erl#L147. We then call the prepare
function: https://github.com/basho/riak_kv/blob/1.4.2/src/riak_kv_get_fsm.erl#L175 which does the consistent hash and identifies the preflist for this operation. This ends up calling validate
: https://github.com/basho/riak_kv/blob/1.4.2/src/riak_kv_get_fsm.erl#L211 which confirms we have valid n,r,pr, and other request specific values. This then calls execute
: https://github.com/basho/riak_kv/blob/1.4.2/src/riak_kv_get_fsm.erl#L272 which calls riak_kv_vnode:get
with a preflist attached: https://github.com/basho/riak_kv/blob/1.4.2/src/riak_kv_get_fsm.erl#L280. This function eventually calls riak_core_vnode_master:command
: https://github.com/basho/riak_kv/blob/1.4.2/src/riak_kv_vnode.erl#L169 which sends the GET request to all vnodes specified in the preflist with their responses going back to the Sender(the get FSM): https://github.com/basho/riak_core/blob/1.4.2/src/riak_core_vnode_master.erl#L80. This request is handled by the handle_command(?KV_GET_REQ
function of the vnode: https://github.com/basho/riak_kv/blob/1.4.2/src/riak_kv_vnode.erl#L406. This calls do_get
which eventually returns back to the original get_fsm who is waiting in waiting_vnode_r
: https://github.com/basho/riak_kv/blob/1.4.2/src/riak_kv_get_fsm.erl#L295. If all partitions have returned and they all returned tomstones the riak_kv_get_core:final_action
will return delete
and the backend delete logic will be started.
This logic begins in the finalize steps of the riak_kv_get_fsm
. IIF all vnodes in the preflist have responded: https://github.com/basho/riak_kv/blob/1.4.2/src/riak_kv_get_fsm.erl#L366 the finalize logic will be started. In this logic riak_kv_get_core:final_action
will return delete
IIF all the partitions that returned returned the exact same tombstone object. This then flows to maybe_delete
which confirms that all the vnodes in the preflist are primaries: https://github.com/basho/riak_kv/blob/1.4.2/src/riak_kv_get_fsm.erl#L394. If not all partitions are primaries, we exit now and will retry the backend delete on a GET operation when all primaries are up for this preflist.
As the delete progresses we call riak_kv_vnode:del
which sends a ?KV_DELETE_REQ
to the riak_kv_vnode_master
which will forward the request to vnodes in the preflist. The vnodes will call their handle command for the delete request: https://github.com/basho/riak_kv/blob/1.4.2/src/riak_kv_vnode.erl#L467 and this calls the do_delete
function: https://github.com/basho/riak_kv/blob/1.4.2/src/riak_kv_vnode.erl#L1320
###Actual backend delete
do_delete
first calls do_get_term
which eventually calls Mod:get
where Mod is the backend configured in riak_kv. If the object is returned it is checked to be a tombstone and then delete_mode
is checked: https://github.com/basho/riak_kv/blob/1.4.2/src/riak_kv_vnode.erl#L1332. Depending on delete_mode
, do_backend_delete
is eventually called: https://github.com/basho/riak_kv/blob/1.4.2/src/riak_kv_vnode.erl#L932. This function simply calls Mod:delete
which will actually reap the tombstone from the backend. Then, the object is removed from the index list(AAE) so it will not be ressurected(hopefully).
###Backend specific delete paths
The call to riak_kv_*_backend:delete
trigers a call to the backends delete function. This path is dependant on the backend used. I'll go over all 3 backends.
#####Bitcask
The bitcask backend delete simply puts a ?TOMBSTONE
into the backend and then removes the entry from the keydir: https://github.com/basho/bitcask/blob/develop/src/bitcask.erl#L274
This tombstone macro is different than the riak_kv tombstone and is specific to bitcask. This entry is removed during merge to clear disk space.
#####LevelDB
Leveldb similarly write a special [{delete, Key}]
entry to the DB: https://github.com/basho/eleveldb/blob/master/src/eleveldb.erl#L151
This entry can trigger a low level compaction if the most recent value for the key is in the young level. The entry triggers old values in lower levels to be removed on compaction and this entry is only removed once all those entries are purged.
#####Memory
The memory backend is entirely coded within riak_kv_memory_backend
because it's a hack using ets tables. The delete logic is very straightforward: https://github.com/basho/riak_kv/blob/1.4.2/src/riak_kv_memory_backend.erl#L230
The logic simply removes this key from the main ets table via ets:delete
and removes all associated indexes from the index table.