dkeightley/object-count-size.md

Last active July 27, 2024 06:00

Star (6) You must be signed in to star a gist
Fork (4) You must be signed in to fork a gist

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/dkeightley/8f2211d6e93a0d5bc294242248ca8fbf.js"></script>
Save dkeightley/8f2211d6e93a0d5bc294242248ca8fbf to your computer and use it in GitHub Desktop.

Download ZIP

etcd object counts and sizes

Raw

object-count-size.md

Exec into the etcd container

RKE1

docker exec -it etcd sh

RKE2

export CRI_CONFIG_FILE=/var/lib/rancher/rke2/agent/etc/crictl.yaml
etcdcontainer=$(/var/lib/rancher/rke2/bin/crictl ps --label io.kubernetes.container.name=etcd --quiet)
/var/lib/rancher/rke2/bin/crictl exec -it $etcdcontainer sh -c "ETCDCTL_ENDPOINTS='https://127.0.0.1:2379' ETCDCTL_CACERT='/var/lib/rancher/rke2/server/tls/etcd/server-ca.crt' ETCDCTL_CERT='/var/lib/rancher/rke2/server/tls/etcd/server-client.crt' ETCDCTL_KEY='/var/lib/rancher/rke2/server/tls/etcd/server-client.key' ETCDCTL_API=3 bash"

Count all objects

for key in `etcdctl get --prefix --keys-only /`
do
  size=`etcdctl get $key --print-value-only | wc -c`
  count=`etcdctl get $key --write-out=fields | grep \"Count\" | cut -f2 -d':'`
  if [ $count -ne 0 ]; then
    versions=`etcdctl get $key --write-out=fields | grep \"Version\" | cut -f2 -d':'`
  else
    versions=0
  fi
  total=$(($size * $versions))
  echo $total $size $versions $count $key >> /tmp/etcdkeys.txt
done

Optional: only specific object, eg: secrets

for key in `etcdctl get --prefix --keys-only /registry/secrets`
do
  size=`etcdctl get $key --print-value-only | wc -c`
  count=`etcdctl get $key --write-out=fields | grep \"Count\" | cut -f2 -d':'`
  if [ $count -ne 0 ]; then
    versions=`etcdctl get $key --write-out=fields | grep \"Version\" | cut -f2 -d':'`
  else
    versions=0
  fi
  total=$(($size * $versions))
  echo $total $size $versions $count $key >> /tmp/etcdkeys-secrets.txt
done

Sort the output

sort -n /tmp/etcdkeys.txt

Copy the file if needed

RKE1

docker cp etcd:/tmp/etcdkeys.txt .

RKE2

kubectl cp -n kube-system <etcd pod name>:/tmp/etcdkeys.txt ./etcdkeys.txt

laddp commented Jul 21, 2022

The object size computations fail if the object count is zero. Here's a chunk that accounts for that:

for key in `etcdctl get --prefix --keys-only /`
do
  size=`etcdctl get $key --print-value-only | wc -c`
  count=`etcdctl get $key --write-out=fields | grep \"Count\" | cut -f2 -d':'`
  if [ $count -ne 0 ]; then
    versions=`etcdctl get $key --write-out=fields | grep \"Version\" | cut -f2 -d':'`
  else
    versions=0
  fi
  total=$(( $size * $versions))
  echo $total $size $versions $count $key >> /tmp/etcdkeys.txt
done

This calls etcdctl multiple times for each key, which seems kinda inefficient, so I'll look at refactoring to be better.

teejaded commented Nov 14, 2023

There's just... no way this is correct.

api	group	namespace	resource	size	versions	total	count
kubernetes.io	operators.coreos.com	operators	ocs-operator.openshift-storage	21457	123415661	2648129838077	1

That's 2.6 TB which is nonsense because the whole db is less than 4gigs.
I believe the old versions get deleted by default every 5 minutes. The space they occupied gets reclaimed on every defrag.

Author

dkeightley commented Jan 17, 2024

There's just... no way this is correct.

This seems impossibly high I agree, this is a worst-case approach as it assumes all object versions are equal - this is most likely never the case. Some object types, like those managed by an operator may increase the object size incrementally over time.

However it can still be useful with the inclusion of the object count and # of versions, to help spot potential object abuse in a cluster. If you have a better way I'd be happy to adopt it.

teejaded commented Jan 17, 2024

The methods above don't actually give you an object count. The only reason it is sometimes 0 is because the key was deleted sometime between when you listed the keys and when the loop got around to asking for it.

To actually figure out how many revisions are being stored you have to recursively get --rev mod_revision to check how many previous versions are stored.

etcdctl get --write-out=json "/kubernetes.io/operators.coreos.com/operators/ocs-operator.openshift-storage" | jq 'del(.kvs[].value)'
{
  "header": {
    "cluster_id": 14841639068965180000,
    "member_id": 10276657743932975000,
    "revision": 1102830774,
    "raft_term": 2
  },
  "kvs": [
    {
      "key": "L2t1YmVybmV0ZXMuaW8vb3BlcmF0b3JzLmNvcmVvcy5jb20vb3BlcmF0b3JzL29jcy1vcGVyYXRvci5vcGVuc2hpZnQtc3RvcmFnZQ==",
      "create_revision": 136141,
      "mod_revision": 1100941289,
      "version": 123416317
    }
  ],
  "count": 1
}

etcdctl get --write-out=json --rev 1100941289 "/kubernetes.io/operators.coreos.com/operators/ocs-operator.openshift-storage" | jq 'del(.kvs[].value)'
{"level":"warn","ts":"2024-01-17T12:42:32.904797-0800","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc00025c000/127.0.0.1:2379","attempt":0,"error":"rpc error: code = OutOfRange desc = etcdserver: mvcc: required revision has been compacted"}
Error: etcdserver: mvcc: required revision has been compacted

I kinda think it's not worth the time/effort to do this because Kubernetes does a compaction every 5 minutes anyway by default.

We can also speed the whole thing up by using the get --from-key feature to iterate through the keys. This iterates through my 150k key etcd db in about 9s.

Note: you will want to reduce LIMIT to 50 or so if you're running this with an in-use etcd server. I found that restoring a snapshot locally and running reports against that is much safer and more reliable way to do analysis of a production server.

#!/bin/bash -euo pipefail

LIMIT=500
TMPFILE=$(mktemp)
NEXT_KEY=$(etcdctl get --limit 1 --keys-only --prefix / )
while true; do
	etcdctl get --limit $LIMIT --write-out=json --from-key "$NEXT_KEY" |
	tee >(jq -r '(.kvs[-1].key | @base64d),(.count)' > $TMPFILE) |
	jq -c '.kvs[1:][] |
		(
			(.key | @base64d) as $key |
			($key | split("/")) as $keya |
			{
			  "fullkey": $key,
			  "api": $keya[1],
			  "group": $keya[2],
			  "namespace": $keya[3],
			  "resource": $keya[4],
			  "size": (.value | @base64d | length),
			  "versions": (.version),
			}
		)' |
	tee -a keys_raw.json |
	jq -r '.fullkey'

	if [ "$(sed -n -e 2p <$TMPFILE)" == "1" ]; then
		break
	fi
	NEXT_KEY="$(sed -n -e 1p <$TMPFILE)"

	echo "Remaining keys: $(sed -n -e 2p <$TMPFILE)"
done
rm $TMPFILE

Heres some useful reports you can generate from this data.

largest 10 groups by size:

jq -s 'group_by(.group) | map({ group: (.[0].group), total: ([.[] | .size] | reduce .[] as $num (0; .+$num)) }) | sort_by(.total) | reverse | .[0:10]' keys_raw.json

largest 10 namespaces by size

jq -s 'group_by(.namespace) | map({ namespace: (.[0].namespace), total: ([.[] | .size] | reduce .[] as $num (0; .+$num)) }) | sort_by(.total) | reverse | .[0:10]' keys_raw.json

largest 10 namepaces by count

jq -s 'group_by(.namespace) | map({ namespace: (.[0].namespace), count: (. | length)}) | sort_by(.count) | reverse | .[0:10]' keys_raw.json

largest 10 groups by count

jq -s 'group_by(.group) | map({ group: (.[0].group), count: (. | length)}) | sort_by(.count) | reverse | .[0:10]' keys_raw.json

highest 10 versions

jq -s 'sort_by(.versions) | reverse | .[0:10]' keys_raw.json

Author

dkeightley commented Jan 17, 2024

Thanks, I'll take a look into this approach 👍

The methods above don't actually give you an object count

Correct, I mistakenly conflated this gist with another one-liner to gather the object counts:

etcdctl get /registry --prefix=true --keys-only | grep -v ^$ | awk -F'/' '{ if ($3 ~ /cattle.io/) {h[$3"/"$4]++} else { h[$3]++ }} END { for(k in h) print h[k], k }' | sort -n

Note the cattle.io portion is just to split out the different CRDs we work with.

dkeightley/object-count-size.md

Exec into the etcd container

RKE1

RKE2

Count all objects

Optional: only specific object, eg: secrets

Sort the output

Copy the file if needed

RKE1

RKE2

laddp commented Jul 21, 2022

Uh oh!

teejaded commented Nov 14, 2023

Uh oh!

dkeightley commented Jan 17, 2024

Uh oh!

teejaded commented Jan 17, 2024

Uh oh!

dkeightley commented Jan 17, 2024

Uh oh!