-
-
Save mikel-code/3a02cbd7ab4a79a025c3 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/bash | |
# | |
# Counts number of tombstones per partition key in one or multiple sstables. | |
# | |
# Usage: ./tombstone-count.sh /var/lib/cassandra/data/mykeyspace/mytable/*-Data.db | |
# | |
# Sample output: | |
# "40e6a9839bf44bdaa624cc53e96733fe" 8 | |
# "8e177ab222c14f868bcb6d2922b18d2b" 8 | |
# "28aaa9db0dad4ae78cabe8bcc25d14a3" 9 | |
# "8367c6c14d8e4ccdbd14e85d4a7d3b1f" 9 | |
# "ecaf2f2409b24fa990a18e79f05b4b30" 12 | |
# "3294ffc4dad44853b675dfdb34911576" 13 | |
# (partition keys without any tombstone(s) are not printed). | |
# Get `jq` here: http://stedolan.github.io/jq/download/ | |
# ltrim taken from http://stackoverflow.com/a/27158086/260805 | |
# The various stages below: | |
# 1. Choose which file(s) you'd like to check tombstones for here. | |
# 2. Convert to JSON. | |
# 3. Count tombstones per primary key. | |
# 4. Convert from JSON to CSV. | |
# 5. Sum duplicates of primary keys. | |
# 6. Sort by the primary key with the most tombstones. | |
ls "$@" \ | |
| xargs --verbose -L 1 sstable2json \ | |
| jq '.[] | {key: .key, length: [.columns[] | select(.[3]=="t")] | length }' \ | |
| awk -F: 'function ltrim(s) { sub(/^[ \t\r\n]+/, "", s); return s } /"key"/ {key=$2;} /"length"/ && $2>0 {print ltrim(key), ltrim($2);}' \ | |
| awk -F, '!($1 in myarr) { myarr[$1]=0 } {myarr[$1] += $2;} END {for(i in myarr) print i, myarr[i];}' \ | |
| sort -n -k 2 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment