Last active
March 31, 2021 18:04
-
-
Save JensRantil/063b7c56ca4a8dfe1c50 to your computer and use it in GitHub Desktop.
How to count number of tombstones per partition key in one or multiple sstables.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/bash | |
# | |
# Counts number of tombstones per partition key in one or multiple sstables. | |
# | |
# Usage: ./tombstone-count.sh /var/lib/cassandra/data/mykeyspace/mytable/*-Data.db | |
# | |
# Sample output: | |
# "40e6a9839bf44bdaa624cc53e96733fe" 8 | |
# "8e177ab222c14f868bcb6d2922b18d2b" 8 | |
# "28aaa9db0dad4ae78cabe8bcc25d14a3" 9 | |
# "8367c6c14d8e4ccdbd14e85d4a7d3b1f" 9 | |
# "ecaf2f2409b24fa990a18e79f05b4b30" 12 | |
# "3294ffc4dad44853b675dfdb34911576" 13 | |
# (partition keys without any tombstone(s) are not printed). | |
# Get `jq` here: http://stedolan.github.io/jq/download/ | |
# ltrim taken from http://stackoverflow.com/a/27158086/260805 | |
# The various stages below: | |
# 1. Choose which file(s) you'd like to check tombstones for here. | |
# 2. Convert to JSON. | |
# 3. Count tombstones per primary key. | |
# 4. Convert from JSON to CSV. | |
# 5. Sum duplicates of primary keys. | |
# 6. Sort by the primary key with the most tombstones. | |
ls "$@" \ | |
| xargs --verbose -L 1 sstable2json \ | |
| jq '.[] | {key: .key, length: [.columns[] | select(.[3]=="t")] | length }' \ | |
| awk -F: 'function ltrim(s) { sub(/^[ \t\r\n]+/, "", s); return s } /"key"/ {key=$2;} /"length"/ && $2>0 {print ltrim(key), ltrim($2);}' \ | |
| awk -F, '!($1 in myarr) { myarr[$1]=0 } {myarr[$1] += $2;} END {for(i in myarr) print i, myarr[i];}' \ | |
| sort -n -k 2 |
I've got the same issue as AlexisWilke. It return nothing, but CQLSH shows tombstones.
actually it worked for me. The script is perfect. The issue i was facing was that all my tombstones were sitting in the memtable. Once flushed, i could read the tombstones. @AlexisWilke : you might be facing the same issue.
This looks great, unfortunately, it doesn't work with Cassandra version 3.X, because sstable2json does not exist in this version. I have changed the code to use sstabledump instead, but I'm getting the following error:
tombstone_count ~/.ccm/test/node1/data0/tk/tt-5b2a97e06fb211e8a1cbed77bfd182ed/*Data*
/home/pedro/cassandra/tools/bin/sstabledump /home/pedro/.ccm/test/node1/data0/tk/tt-5b2a97e06fb211e8a1cbed77bfd182ed/mc-30-big-Data.db
jq: error (at <stdin>:54): Cannot iterate over null (null)
@sedulam any luck with tombstonecount on 3.x
Find an updated version for Cassandra 3.0.x at https://gist.github.com/fholzer/d6b7f1ce98906b5730cae67c179e0dd2
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I can see that I have tombstones in various tables, for example, a CQL command with TRACE ON gives me a line such as:
Yet, your code returns nothing. Looking at the data output by sstable2json, I can see some 3rd parameter set to "d", but none are equal to "t". Could that be a version change?