Skip to content

Instantly share code, notes, and snippets.

@mh0w
Created July 8, 2024 10:30
Show Gist options
  • Save mh0w/361e8324b8fcdcf2d72ad1a45af54578 to your computer and use it in GitHub Desktop.
Save mh0w/361e8324b8fcdcf2d72ad1a45af54578 to your computer and use it in GitHub Desktop.
Identifying large files in a git repo
git rev-list --objects --all |
git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)' |
sed -n 's/^blob //p' |
awk '$2 >= 2^20' |
sort --numeric-sort --key=2 |
cut -c 1-12,41- |
$(command -v gnumfmt || echo numfmt) --field=2 --to=iec-i --suffix=B --padding=7 --round=nearest
git rev-list HEAD | nl | xargs -n 2 -P 8 sh -c 'git ls-tree -rl "$1" | perl -p -e "\$_ =~ s/[^ ]*+ [^ ]*+ ([^ ]*+) ++([^\t]*+)\t.*+/\1 \2/" | sort > logfile-$0' ; sort -m -u logfile-* | awk '{ sum += $2 } END { print sum }'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment