Skip to content

Instantly share code, notes, and snippets.

@EpocSquadron
Created January 15, 2014 22:48
Show Gist options
  • Save EpocSquadron/8446291 to your computer and use it in GitHub Desktop.
Save EpocSquadron/8446291 to your computer and use it in GitHub Desktop.
Script to find the largest files in a git history, with the intent to use the list to filter-branch remove the files forever.
#!/bin/bash
# Adapted from http://blog.jessitron.com/2013/08/finding-and-removing-large-files-in-git.html
BRANCH_LIST=$@
filesAndSizes() {
# Get all files recursively in this revision
git ls-tree -lr $1 | \
# Remove all but the filesize
cut -c54- | \
# Remove ones who start with empty, they are
# less than a million bytes
grep -v '^ '
}
getAllFilesAndSizes() {
for REVISION in $1; do
filesAndSizes $REVISION
done
}
REVISIONS=`git rev-list $BRANCH_LIST`
getAllFilesAndSizes $REVISIONS | \
sort -u | \
perl -e '
while (<>) {
chomp;
@stuff=split("\t");
$sums{$stuff[1]} += $stuff[0];
}
print "$sums{$_} $_\n" for (keys %sums);
' | \
sort -rn
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment