Here is an example of several different kinds of C*
tombstones. For reference, they are:
- Partition
- Row
- Cell
- Range
The obvious culprit of tombstone creation is DELETE
, but there are other - less obvious - sources of the tombstone. Let’s see exacly what happens on disk when a tombstone is created. It's funny to say that a tombstone is created... aren't we deleting things? Remember, everything in C*
is a write! A DELETE
operation writes another sstable
entry with a newer timestamp than all the other entries... and the most recent timestamp wins!
I’ve created a DataStax Studio notebook that complements this gist. You should be able to download it and run it for yourself :-) nodetool
and sstabledump
commands need to be run from a terminal on the node(s) you're working with.
Let's Go!
CREATE KEYSPACE IF NOT EXISTS tombstones WITH REPLICATION = {'class' : 'SimpleStrategy', 'replication_factor' : 1};
CREATE TABLE IF NOT EXISTS tombstones.test (
partitionkey int,
clusteringcol text,
string text,
number int,
PRIMARY KEY (partitionkey, clusteringcol)
) WITH CLUSTERING ORDER BY (clusteringcol DESC);
INSERT INTO tombstones.test (partitionkey, clusteringcol, string, number) VALUES (1,'clustering1','some text',123);
INSERT INTO tombstones.test (partitionkey, clusteringcol, string, number) VALUES (1,'clustering2','some other text',456);
INSERT INTO tombstones.test (partitionkey, clusteringcol, string, number) VALUES (1,'clustering3','what to write?',789);
INSERT INTO tombstones.test (partitionkey, clusteringcol, string, number) VALUES (1,'clustering4','to be, || ! to be.',0);
INSERT INTO tombstones.test (partitionkey, clusteringcol, string, number) VALUES (2,'clustering1','I am another partition',123);
INSERT INTO tombstones.test (partitionkey, clusteringcol, string, number) VALUES (3,'clustering1','yes, a third partition',456);
INSERT INTO tombstones.test (partitionkey, clusteringcol, string, number) VALUES (2,'clustering2','&&&&&',789);
INSERT INTO tombstones.test (partitionkey, clusteringcol, string, number) VALUES (4,'clustering1','I wonder what a tombstone looks like',0);
Now I have a keyspace, table, and some test data… let’s get to deleting!
But hold on a second, what does this data look like on disk in its un-deleted form? For that, we go to sstabledump
Take a look at the parameters for sstabledump
here
We run nodetool flush
followed by sstabledump -d /path/to/*Data.db
in order to get the following:
[1]@0 Row[info=[ts=1571267799116790] ]: clustering4 | [number=0 ts=1571267799116790], [string=to be, || ! to be. ts=1571267799116790]
[1]@61 Row[info=[ts=1571267799115484] ]: clustering3 | [number=789 ts=1571267799115484], [string=what to write? ts=1571267799115484]
[1]@100 Row[info=[ts=1571267799114163] ]: clustering2 | [number=456 ts=1571267799114163], [string=some other text ts=1571267799114163]
[1]@140 Row[info=[ts=1571267799112078] ]: clustering1 | [number=123 ts=1571267799112078], [string=some text ts=1571267799112078]
[2]@174 Row[info=[ts=1571267799123760] ]: clustering2 | [number=789 ts=1571267799123760], [string=&&&&& ts=1571267799123760]
[2]@222 Row[info=[ts=1571267799118192] ]: clustering1 | [number=123 ts=1571267799118192], [string=I am another partition ts=1571267799118192]
[4]@270 Row[info=[ts=1571267799125237] ]: clustering1 | [number=0 ts=1571267799125237], [string=I wonder what a tombstone looks like ts=1571267799125237]
[3]@350 Row[info=[ts=1571267799121652] ]: clustering1 | [number=456 ts=1571267799121652], [string=yes, a third partition ts=1571267799121652]
Let’s start off with a row tombstone. We just want to delete one particular row. Now, this could also be a partition delete if the PRIMARY KEY
doesn’t contain any CLUSTERING COLUMN
, but our PK does contain a clustering column so we have to be specific.
DELETE FROM tombstones.test WHERE partitionkey = 1 AND clusteringcol = 'clustering1'; //this is a row tombstone
Now we run nodetool flush
again (to flush the memtable to sstable so we can get at it), followed by sstabledump /path/to/*Data.db
. I’m running sstabledump
without the -d
switch this time so you can see the json
format of the sstable
. This will be a recurring theme: I'll use the -d
switch when we look at the whole table, and the json
format when we want to look closer. In order to save time and be pithy, assume that I run nodetool flush
, sstabledump
, nodetool compact <ks> <tbl>
, then another sstabledump
after every operation.
[
{
"partition" : {
"key" : [ "1" ],
"position" : 0
},
"rows" : [
{
"type" : "row",
"position" : 37,
"clustering" : [ "clustering1" ],
"deletion_info" : { "marked_deleted" : "2019-10-16T23:22:25.951117Z", "local_delete_time" : "2019-10-16T23:22:25Z" },
"cells" : [ ]
}
]
}
]
Notice that you don't see any of the other cell values here, like number
or string
. Those values have been removed from this entry, leaving only the empty cells
array. Here's the whole table after compaction: everything merged back into a single entry. The json
output above and was in its own sstable
before we compacted.
[1]@0 Row[info=[ts=1571267799116790] ]: clustering4 | [number=0 ts=1571267799116790], [string=to be, || ! to be. ts=1571267799116790]
[1]@61 Row[info=[ts=1571267799115484] ]: clustering3 | [number=789 ts=1571267799115484], [string=what to write? ts=1571267799115484]
[1]@100 Row[info=[ts=1571267799114163] ]: clustering2 | [number=456 ts=1571267799114163], [string=some other text ts=1571267799114163]
[1]@140 Row[info=[ts=-9223372036854775808] del=deletedAt=1571268145951117, localDeletion=1571268145 ]: clustering1 | #row_tombstone
[2]@164 Row[info=[ts=1571267799123760] ]: clustering2 | [number=789 ts=1571267799123760], [string=&&&&& ts=1571267799123760]
[2]@212 Row[info=[ts=1571267799118192] ]: clustering1 | [number=123 ts=1571267799118192], [string=I am another partition ts=1571267799118192]
[4]@260 Row[info=[ts=1571267799125237] ]: clustering1 | [number=0 ts=1571267799125237], [string=I wonder what a tombstone looks like ts=1571267799125237]
[3]@340 Row[info=[ts=1571267799121652] ]: clustering1 | [number=456 ts=1571267799121652], [string=yes, a third partition ts=1571267799121652]
OK, so that’s a row tombstone. Now let’s see what a partition tombstone looks like…
DELETE FROM tombstones.test WHERE partitionkey = 4; //this is a partition tombstone
Here's the sstable
after the flush:
[
{
"partition" : {
"key" : [ "4" ],
"position" : 0,
"deletion_info" : { "marked_deleted" : "2019-10-17T01:30:31.259830Z", "local_delete_time" : "2019-10-17T01:30:31Z" }
},
"rows" : [ ]
}
]
Very much like the empty cells
array in the row tombstone example, we now have an empty rows
array. sstabledump
again after compaction:
[1]@0 Row[info=[ts=1571267799116790] ]: clustering4 | [number=0 ts=1571267799116790], [string=to be, || ! to be. ts=1571267799116790]
[1]@61 Row[info=[ts=1571267799115484] ]: clustering3 | [number=789 ts=1571267799115484], [string=what to write? ts=1571267799115484]
[1]@100 Row[info=[ts=1571267799114163] ]: clustering2 | [number=456 ts=1571267799114163], [string=some other text ts=1571267799114163]
[1]@139 Row[info=[ts=-9223372036854775808] del=deletedAt=1571268145951117, localDeletion=1571268145 ]: clustering1 | #row_tombstone
[2]@163 Row[info=[ts=1571267799123760] ]: clustering2 | [number=789 ts=1571267799123760], [string=&&&&& ts=1571267799123760]
[2]@211 Row[info=[ts=1571267799118192] ]: clustering1 | [number=123 ts=1571267799118192], [string=I am another partition ts=1571267799118192]
[4]@259 deletedAt=1571275831259830, localDeletion=1571275831 #partition_tombstone
[3]@278 Row[info=[ts=1571267799121652] ]: clustering1 | [number=456 ts=1571267799121652], [string=yes, a third partition ts=1571267799121652]
Now what are we left with
SELECT * FROM tombstones.test;
partitionkey | clusteringcol | number | string |
---|---|---|---|
1 | clustering4 | 0 | to be, or ! to be. |
1 | clustering3 | 789 | what to write? |
1 | clustering2 | 456 | some other text |
2 | clustering2 | 789 | &&&&& |
2 | clustering1 | 123 | I am another partition |
3 | clustering1 | 456 | yes, a third partition |
Moving right along to a cell tombstone. This is where things get a little more interesting… Let’s say I want to remove a specific cell’s value. (For reference, a cell is the value of a column in a specific row)
DELETE string FROM tombstones.test WHERE partitionkey = 3 AND clusteringcol = 'clustering1'; //this is a cell tombstone
Notice that we had to be very specific again in our DELETE
statement, since the PRIMARY KEY
contains not only the PARTITION KEY
but a CLUSTERING COLUMN
. Here's the sstabledump
where we can see that the pre-compaction sstable
doesn't contain the cell value for number
, just string
, which has now been tombstone'd.
[
{
"partition" : {
"key" : [ "3" ],
"position" : 0
},
"rows" : [
{
"type" : "row",
"position" : 38,
"clustering" : [ "clustering1" ],
"cells" : [
{ "name" : "string", "deletion_info" : { "local_delete_time" : "2019-10-17T01:39:37Z" },
"tstamp" : "2019-10-17T01:39:37.357675Z"
}
]
}
]
}
]
And after compaction, all the values are merged together again:
[1]@0 Row[info=[ts=1571267799116790] ]: clustering4 | [number=0 ts=1571267799116790], [string=to be, || ! to be. ts=1571267799116790]
[1]@61 Row[info=[ts=1571267799115484] ]: clustering3 | [number=789 ts=1571267799115484], [string=what to write? ts=1571267799115484]
[1]@100 Row[info=[ts=1571267799114163] ]: clustering2 | [number=456 ts=1571267799114163], [string=some other text ts=1571267799114163]
[1]@139 Row[info=[ts=-9223372036854775808] del=deletedAt=1571268145951117, localDeletion=1571268145 ]: clustering1 | #row_tombstone
[2]@163 Row[info=[ts=1571267799123760] ]: clustering2 | [number=789 ts=1571267799123760], [string=&&&&& ts=1571267799123760]
[2]@211 Row[info=[ts=1571267799118192] ]: clustering1 | [number=123 ts=1571267799118192], [string=I am another partition ts=1571267799118192]
[4]@259 deletedAt=1571275831259830, localDeletion=1571275831 #partition_tombstone
[3]@278 Row[info=[ts=1571267799121652] ]: clustering1 | [number=456 ts=1571267799121652], [string=<tombstone> ts=1571276377357675 ldt=1571276377] #cell_tombstone
Where are we now?
SELECT * FROM tombstones.test;
partitionkey | clusteringcol | number | string |
---|---|---|---|
1 | clustering4 | 0 | to be, or ! to be. |
1 | clustering3 | 789 | what to write? |
1 | clustering2 | 456 | some other text |
2 | clustering2 | 789 | &&&&& |
2 | clustering1 | 123 | I am another partition |
3 | clustering1 | 456 |
So now let’s take a look at some other ways we could end up with a cell tombstone. First, let’s INSERT
a new row and call it partition 5
INSERT INTO tombstones.test (partitionkey, clusteringcol, string, number) VALUES (5,'clustering1','I have been inserted',9000);
Here's the sstable
in json
[
{
"partition" : {
"key" : [ "5" ],
"position" : 0
},
"rows" : [
{
"type" : "row",
"position" : 62,
"clustering" : [ "clustering1" ],
"liveness_info" : { "tstamp" : "2019-10-17T01:53:19.088984Z" },
"cells" : [
{ "name" : "number", "value" : 9000 },
{ "name" : "string", "value" : "I have been inserted" }
]
}
]
}
]
And again the whole table after flush
and compaction
[5]@0 Row[info=[ts=1571277199088984] ]: clustering1 | [number=9000 ts=1571277199088984], [string=I have been inserted ts=1571277199088984]
[1]@67 Row[info=[ts=1571267799116790] ]: clustering4 | [number=0 ts=1571267799116790], [string=to be, || ! to be. ts=1571267799116790]
[1]@128 Row[info=[ts=1571267799115484] ]: clustering3 | [number=789 ts=1571267799115484], [string=what to write? ts=1571267799115484]
[1]@167 Row[info=[ts=1571267799114163] ]: clustering2 | [number=456 ts=1571267799114163], [string=some other text ts=1571267799114163]
[1]@206 Row[info=[ts=-9223372036854775808] del=deletedAt=1571268145951117, localDeletion=1571268145 ]: clustering1 | #row_tombstone
[2]@230 Row[info=[ts=1571267799123760] ]: clustering2 | [number=789 ts=1571267799123760], [string=&&&&& ts=1571267799123760]
[2]@278 Row[info=[ts=1571267799118192] ]: clustering1 | [number=123 ts=1571267799118192], [string=I am another partition ts=1571267799118192]
[4]@326 deletedAt=1571275831259830, localDeletion=1571275831 #partition_tombstone
[3]@345 Row[info=[ts=1571267799121652] ]: clustering1 | [number=456 ts=1571267799121652], [string=<tombstone> ts=1571276377357675 ldt=1571276377] #cell_tombstone
What about if I want to UPDATE
one of the values in the row I just INSERT
'd? We have to be very specific:
UPDATE tombstones.test SET string = 'I have been upserted' WHERE partitionkey = 5 AND clusteringcol = 'clustering1';
sstable
[
{
"partition" : {
"key" : [ "5" ],
"position" : 0
},
"rows" : [
{
"type" : "row",
"position" : 58,
"clustering" : [ "clustering1" ],
"cells" : [
{ "name" : "string", "value" : "I have been upserted", "tstamp" : "2019-10-17T01:57:41.012018Z" }
]
}
]
}
]
Note that the only value that was written in this sstable
was "I have been upserted"
. The number
column value is absent. But when we SELECT
:
SELECT * FROM tombstones.test WHERE partitionkey = 5;
partitionkey | clusteringcol | number | string |
---|---|---|---|
5 | clustering1 | 9000 | I have been upserted |
...the new string
value is returned with the old number
value!
Again, after compaction
[5]@0 Row[info=[ts=1571277199088984] ]: clustering1 | [number=9000 ts=1571277199088984], [string=I have been upserted ts=1571277461012018] #merged back together!
[1]@72 Row[info=[ts=1571267799116790] ]: clustering4 | [number=0 ts=1571267799116790], [string=to be, || ! to be. ts=1571267799116790]
[1]@133 Row[info=[ts=1571267799115484] ]: clustering3 | [number=789 ts=1571267799115484], [string=what to write? ts=1571267799115484]
[1]@172 Row[info=[ts=1571267799114163] ]: clustering2 | [number=456 ts=1571267799114163], [string=some other text ts=1571267799114163]
[1]@211 Row[info=[ts=-9223372036854775808] del=deletedAt=1571268145951117, localDeletion=1571268145 ]: clustering1 |
[2]@235 Row[info=[ts=1571267799123760] ]: clustering2 | [number=789 ts=1571267799123760], [string=&&&&& ts=1571267799123760]
[2]@283 Row[info=[ts=1571267799118192] ]: clustering1 | [number=123 ts=1571267799118192], [string=I am another partition ts=1571267799118192]
[4]@331 deletedAt=1571275831259830, localDeletion=1571275831
[3]@350 Row[info=[ts=1571267799121652] ]: clustering1 | [number=456 ts=1571267799121652], [string=<tombstone> ts=1571276377357675 ldt=1571276377]
The same works for INSERT
when we leave a column unset
. Let's use string
in this case:
INSERT INTO tombstones.test (partitionkey, clusteringcol, number) VALUES (5,'clustering1',8888);
sstable
[
{
"partition" : {
"key" : [ "5" ],
"position" : 0
},
"rows" : [
{
"type" : "row",
"position" : 41,
"clustering" : [ "clustering1" ],
"liveness_info" : { "tstamp" : "2019-10-17T02:03:47.755945Z" },
"cells" : [
{ "name" : "number", "value" : 8888 }
]
}
]
}
]
Similarly to the UPDATE
, the only value that was written to the sstable
was 8888
. The string
column value is absent this time. SELECT
before compaction:
SELECT * FROM tombstones.test WHERE partitionkey = 5;
partitionkey | clusteringcol | number | string |
---|---|---|---|
5 | clustering1 | 8888 | I have been upserted |
Now the string
column is the same, but the number
is the new value!
[5]@0 Row[info=[ts=1571277827755945] ]: clustering1 | [number=8888 ts=1571277827755945], [string=I have been upserted ts=1571277461012018]
[1]@72 Row[info=[ts=1571267799116790] ]: clustering4 | [number=0 ts=1571267799116790], [string=to be, || ! to be. ts=1571267799116790]
[1]@133 Row[info=[ts=1571267799115484] ]: clustering3 | [number=789 ts=1571267799115484], [string=what to write? ts=1571267799115484]
[1]@172 Row[info=[ts=1571267799114163] ]: clustering2 | [number=456 ts=1571267799114163], [string=some other text ts=1571267799114163]
[1]@211 Row[info=[ts=-9223372036854775808] del=deletedAt=1571268145951117, localDeletion=1571268145 ]: clustering1 |
[2]@235 Row[info=[ts=1571267799123760] ]: clustering2 | [number=789 ts=1571267799123760], [string=&&&&& ts=1571267799123760]
[2]@283 Row[info=[ts=1571267799118192] ]: clustering1 | [number=123 ts=1571267799118192], [string=I am another partition ts=1571267799118192]
[4]@331 deletedAt=1571275831259830, localDeletion=1571275831
[3]@350 Row[info=[ts=1571267799121652] ]: clustering1 | [number=456 ts=1571267799121652], [string=<tombstone> ts=1571276377357675 ldt=1571276377]
Let's say I explicitly want to remove the number
value, and at the same time change the string
using an INSERT
:
INSERT INTO tombstones.test (partitionkey, clusteringcol, string) VALUES (5,'clustering1','did the number get deleted?');
sstabledump
[
{
"partition" : {
"key" : [ "5" ],
"position" : 0
},
"rows" : [
{
"type" : "row",
"position" : 65,
"clustering" : [ "clustering1" ],
"liveness_info" : { "tstamp" : "2019-10-17T02:10:03.024349Z" },
"cells" : [
{ "name" : "string", "value" : "did the number get deleted?" }
]
}
]
}
]
I don't see the value for number
anywhere after a flush...
SELECT * FROM tombstones.test WHERE partitionkey = 5; //uh oh
partitionkey | clusteringcol | number | string |
---|---|---|---|
5 | clustering1 | 8888 | did the number get deleted? |
Our previous number
value is still there!
How can we ensure that we're DELETE
on the number
value, UPDATE
'ing the string
, all while using an INSERT
statement? (have we all gone mad?!?!?)
INSERT INTO tombstones.test (partitionkey, clusteringcol, string, number) VALUES (5,'clustering1','now the number is gone...',null); //this is a cell tombstone
sstable
after nodetool flush
[
{
"partition" : {
"key" : [ "5" ],
"position" : 0
},
"rows" : [
{
"type" : "row",
"position" : 64,
"clustering" : [ "clustering1" ],
"liveness_info" : { "tstamp" : "2019-10-17T02:12:27.380989Z" },
"cells" : [
{ "name" : "number", "deletion_info" : { "local_delete_time" : "2019-10-17T02:12:27Z" }
},
{ "name" : "string", "value" : "now the number is gone..." }
]
}
]
}
]
sstable
after compaction:
[5]@0 Row[info=[ts=1571278347380989] ]: clustering1 | [number=<tombstone> ts=1571278347380989 ldt=1571278347], [string=now the number is gone... ts=1571278347380989] #cell_tombstone
[1]@70 Row[info=[ts=1571267799116790] ]: clustering4 | [number=0 ts=1571267799116790], [string=to be, || ! to be. ts=1571267799116790]
[1]@131 Row[info=[ts=1571267799115484] ]: clustering3 | [number=789 ts=1571267799115484], [string=what to write? ts=1571267799115484]
[1]@170 Row[info=[ts=1571267799114163] ]: clustering2 | [number=456 ts=1571267799114163], [string=some other text ts=1571267799114163]
[1]@209 Row[info=[ts=-9223372036854775808] del=deletedAt=1571268145951117, localDeletion=1571268145 ]: clustering1 | #row_tombstone
[2]@233 Row[info=[ts=1571267799123760] ]: clustering2 | [number=789 ts=1571267799123760], [string=&&&&& ts=1571267799123760]
[2]@281 Row[info=[ts=1571267799118192] ]: clustering1 | [number=123 ts=1571267799118192], [string=I am another partition ts=1571267799118192]
[4]@329 deletedAt=1571275831259830, localDeletion=1571275831 #partition_tombstone
[3]@348 Row[info=[ts=1571267799121652] ]: clustering1 | [number=456 ts=1571267799121652], [string=<tombstone> ts=1571276377357675 ldt=1571276377] #cell_tombstone
SELECT * FROM tombstones.test WHERE partitionkey = 5; //now it's really gone
partitionkey | clusteringcol | number | string |
---|---|---|---|
5 | clustering1 | now the number is gone... |
The last type of tombstone that we’ll look at is the range tombstone. This can be very useful when deleting multiple rows from within the same partition. Instead of deleting each row individually, you will have one tombstone at the upper bound and another at the lower. Let’s watch...
CREATE TABLE IF NOT EXISTS tombstones.range (
partitionkey int,
clusteringcol int,
string text,
PRIMARY KEY (partitionkey, clusteringcol)
) WITH CLUSTERING ORDER BY (clusteringcol DESC);
INSERT INTO tombstones.range (partitionkey,clusteringcol,string) VALUES (1,1,'first');
INSERT INTO tombstones.range (partitionkey,clusteringcol,string) VALUES (1,2,'second');
INSERT INTO tombstones.range (partitionkey,clusteringcol,string) VALUES (1,3,'third');
INSERT INTO tombstones.range (partitionkey,clusteringcol,string) VALUES (1,4,'fourth');
INSERT INTO tombstones.range (partitionkey,clusteringcol,string) VALUES (1,5,'fifth');
INSERT INTO tombstones.range (partitionkey,clusteringcol,string) VALUES (1,6,'sixth');
INSERT INTO tombstones.range (partitionkey,clusteringcol,string) VALUES (1,7,'seventh');
INSERT INTO tombstones.range (partitionkey,clusteringcol,string) VALUES (1,8,'eighth');
INSERT INTO tombstones.range (partitionkey,clusteringcol,string) VALUES (1,9,'ninth');
INSERT INTO tombstones.range (partitionkey,clusteringcol,string) VALUES (1,10,'tenth');
INSERT INTO tombstones.range (partitionkey,clusteringcol,string) VALUES (2,1,'first');
INSERT INTO tombstones.range (partitionkey,clusteringcol,string) VALUES (2,2,'second');
INSERT INTO tombstones.range (partitionkey,clusteringcol,string) VALUES (2,3,'third');
INSERT INTO tombstones.range (partitionkey,clusteringcol,string) VALUES (2,4,'fourth');
INSERT INTO tombstones.range (partitionkey,clusteringcol,string) VALUES (2,5,'fifth');
INSERT INTO tombstones.range (partitionkey,clusteringcol,string) VALUES (2,6,'sixth');
INSERT INTO tombstones.range (partitionkey,clusteringcol,string) VALUES (2,7,'seventh');
INSERT INTO tombstones.range (partitionkey,clusteringcol,string) VALUES (2,8,'eighth');
INSERT INTO tombstones.range (partitionkey,clusteringcol,string) VALUES (2,9,'ninth');
INSERT INTO tombstones.range (partitionkey,clusteringcol,string) VALUES (2,10,'tenth');
sstable
[1]@0 Row[info=[ts=1571278763051320] ]: 10 | [string=tenth ts=1571278763051320]
[1]@35 Row[info=[ts=1571278763050489] ]: 9 | [string=ninth ts=1571278763050489]
[1]@52 Row[info=[ts=1571278763049568] ]: 8 | [string=eighth ts=1571278763049568]
[1]@70 Row[info=[ts=1571278763048579] ]: 7 | [string=seventh ts=1571278763048579]
[1]@89 Row[info=[ts=1571278763047440] ]: 6 | [string=sixth ts=1571278763047440]
[1]@106 Row[info=[ts=1571278763046122] ]: 5 | [string=fifth ts=1571278763046122]
[1]@123 Row[info=[ts=1571278763044830] ]: 4 | [string=fourth ts=1571278763044830]
[1]@141 Row[info=[ts=1571278763043442] ]: 3 | [string=third ts=1571278763043442]
[1]@158 Row[info=[ts=1571278763042178] ]: 2 | [string=second ts=1571278763042178]
[1]@176 Row[info=[ts=1571278763040463] ]: 1 | [string=first ts=1571278763040463]
[2]@193 Row[info=[ts=1571278763061688] ]: 10 | [string=tenth ts=1571278763061688]
[2]@229 Row[info=[ts=1571278763060188] ]: 9 | [string=ninth ts=1571278763060188]
[2]@247 Row[info=[ts=1571278763058940] ]: 8 | [string=eighth ts=1571278763058940]
[2]@266 Row[info=[ts=1571278763058001] ]: 7 | [string=seventh ts=1571278763058001]
[2]@286 Row[info=[ts=1571278763057052] ]: 6 | [string=sixth ts=1571278763057052]
[2]@304 Row[info=[ts=1571278763056174] ]: 5 | [string=fifth ts=1571278763056174]
[2]@321 Row[info=[ts=1571278763055216] ]: 4 | [string=fourth ts=1571278763055216]
[2]@339 Row[info=[ts=1571278763054278] ]: 3 | [string=third ts=1571278763054278]
[2]@356 Row[info=[ts=1571278763053360] ]: 2 | [string=second ts=1571278763053360]
[2]@374 Row[info=[ts=1571278763052304] ]: 1 | [string=first ts=1571278763052304]
The syntax for a range DELETE
specifies the PARTITION KEY
, and uses conditional operators for the CLUSTERING COLUMN
(s)
DELETE FROM tombstones.range WHERE partitionkey = 1 AND clusteringcol > 3 AND clusteringcol < 8;
sstable
[
{
"partition" : {
"key" : [ "1" ],
"position" : 0
},
"rows" : [
{
"type" : "range_tombstone_bound",
"start" : {
"type" : "exclusive",
"clustering" : [ 8 ],
"deletion_info" : { "marked_deleted" : "2019-10-17T02:24:18.898466Z", "local_delete_time" : "2019-10-17T02:24:18Z" }
}
},
{
"type" : "range_tombstone_bound",
"end" : {
"type" : "exclusive",
"clustering" : [ 3 ],
"deletion_info" : { "marked_deleted" : "2019-10-17T02:24:18.898466Z", "local_delete_time" : "2019-10-17T02:24:18Z" }
}
}
]
}
]
SELECT * FROM tombstones.range WHERE partitionkey = 1;
partitionkey | clusteringcol | string |
---|---|---|
1 | 10 | tenth |
1 | 9 | ninth |
1 | 8 | eighth |
1 | 3 | third |
1 | 2 | second |
1 | 1 | first |
After compaction... Viola!
[1]@0 Row[info=[ts=1571278763051320] ]: 10 | [string=tenth ts=1571278763051320]
[1]@35 Row[info=[ts=1571278763050489] ]: 9 | [string=ninth ts=1571278763050489]
[1]@52 Row[info=[ts=1571278763049568] ]: 8 | [string=eighth ts=1571278763049568]
[1]@70 Marker EXCL_START_BOUND(8)@1571279058898466/1571279058
[1]@87 Marker EXCL_END_BOUND(3)@1571279058898466/1571279058
[1]@104 Row[info=[ts=1571278763043442] ]: 3 | [string=third ts=1571278763043442]
[1]@121 Row[info=[ts=1571278763042178] ]: 2 | [string=second ts=1571278763042178]
[1]@139 Row[info=[ts=1571278763040463] ]: 1 | [string=first ts=1571278763040463]
[2]@156 Row[info=[ts=1571278763061688] ]: 10 | [string=tenth ts=1571278763061688]
[2]@192 Row[info=[ts=1571278763060188] ]: 9 | [string=ninth ts=1571278763060188]
[2]@210 Row[info=[ts=1571278763058940] ]: 8 | [string=eighth ts=1571278763058940]
[2]@229 Row[info=[ts=1571278763058001] ]: 7 | [string=seventh ts=1571278763058001]
[2]@249 Row[info=[ts=1571278763057052] ]: 6 | [string=sixth ts=1571278763057052]
[2]@267 Row[info=[ts=1571278763056174] ]: 5 | [string=fifth ts=1571278763056174]
[2]@284 Row[info=[ts=1571278763055216] ]: 4 | [string=fourth ts=1571278763055216]
[2]@302 Row[info=[ts=1571278763054278] ]: 3 | [string=third ts=1571278763054278]
[2]@319 Row[info=[ts=1571278763053360] ]: 2 | [string=second ts=1571278763053360]
[2]@337 Row[info=[ts=1571278763052304] ]: 1 | [string=first ts=1571278763052304]
How about just one more partition tombstone for the road? Here's the full json
output of sstabledump
of our range example's partition 2 before the DELETE
:
{
"partition" : {
"key" : [ "2" ],
"position" : 156
},
"rows" : [
{
"type" : "row",
"position" : 192,
"clustering" : [ 10 ],
"liveness_info" : { "tstamp" : "2019-10-17T02:19:23.061688Z" },
"cells" : [
{ "name" : "string", "value" : "tenth" }
]
},
{
"type" : "row",
"position" : 192,
"clustering" : [ 9 ],
"liveness_info" : { "tstamp" : "2019-10-17T02:19:23.060188Z" },
"cells" : [
{ "name" : "string", "value" : "ninth" }
]
},
{
"type" : "row",
"position" : 210,
"clustering" : [ 8 ],
"liveness_info" : { "tstamp" : "2019-10-17T02:19:23.058940Z" },
"cells" : [
{ "name" : "string", "value" : "eighth" }
]
},
{
"type" : "row",
"position" : 229,
"clustering" : [ 7 ],
"liveness_info" : { "tstamp" : "2019-10-17T02:19:23.058001Z" },
"cells" : [
{ "name" : "string", "value" : "seventh" }
]
},
{
"type" : "row",
"position" : 249,
"clustering" : [ 6 ],
"liveness_info" : { "tstamp" : "2019-10-17T02:19:23.057052Z" },
"cells" : [
{ "name" : "string", "value" : "sixth" }
]
},
{
"type" : "row",
"position" : 267,
"clustering" : [ 5 ],
"liveness_info" : { "tstamp" : "2019-10-17T02:19:23.056174Z" },
"cells" : [
{ "name" : "string", "value" : "fifth" }
]
},
{
"type" : "row",
"position" : 284,
"clustering" : [ 4 ],
"liveness_info" : { "tstamp" : "2019-10-17T02:19:23.055216Z" },
"cells" : [
{ "name" : "string", "value" : "fourth" }
]
},
{
"type" : "row",
"position" : 302,
"clustering" : [ 3 ],
"liveness_info" : { "tstamp" : "2019-10-17T02:19:23.054278Z" },
"cells" : [
{ "name" : "string", "value" : "third" }
]
},
{
"type" : "row",
"position" : 319,
"clustering" : [ 2 ],
"liveness_info" : { "tstamp" : "2019-10-17T02:19:23.053360Z" },
"cells" : [
{ "name" : "string", "value" : "second" }
]
},
{
"type" : "row",
"position" : 337,
"clustering" : [ 1 ],
"liveness_info" : { "tstamp" : "2019-10-17T02:19:23.052304Z" },
"cells" : [
{ "name" : "string", "value" : "first" }
]
}
]
}
buh bye...
DELETE FROM tombstones.range WHERE partitionkey = 2; //this is a partition tombstone, just for fun :-)
sstabledump
after:
[
{
"partition" : {
"key" : [ "2" ],
"position" : 0,
"deletion_info" : { "marked_deleted" : "2019-10-17T02:27:46.045375Z", "local_delete_time" : "2019-10-17T02:27:46Z" }
},
"rows" : [ ]
}
]
Somehow that's very satisfying. If you've made it this far down my soliloquy, thanks! Hopefully it's been helpful. E-mail me @ [email protected] if you have any questions or feedback.