This solution is a combination of two sources
- http://naleid.com/blog/2012/01/17/finding-and-purging-big-files-from-git-history
- http://stackoverflow.com/questions/24693360/git-large-pack-file-delete-and-rewrite
First, create list of all files tracked in the repository
git rev-list --objects --all | sort -k 2 > allfileshas.txt
Get the SHA for all committed files and sort them from biggest to smallest
git gc && git verify-pack -v .git/objects/pack/pack-*.idx | \
egrep "^\w+ blob\W+[0-9]+ [0-9]+ [0-9]+$" | \
sort -k 3 -n -r > bigobjects.txt
Extract the file names from bigobjects.txt for all the large, and write to a file
for SHA in `cut -f 1 -d\ < bigobjects.txt`; do
echo $(grep $SHA bigobjects.txt) $(grep $SHA allfileshas.txt) | \
awk '{print $1,$3,$7}' >> bigtosmall.txt
done;
Now, look at the bigtosmall.txt file, and keep only the file names that should be deleted from the history.
Look at each of the files listed in bigtosmall.txt, and remove them from
the repository history. This is a destructive change!
for MY_BIG in $(cat bigtosmall.txt) ; do
echo $MY_BIG
git filter-branch -f \
--prune-empty \
--index-filter "git rm -rf --cached --ignore-unmatch $MY_BIG" \
--tag-name-filter cat -- --all
done
Sometimes, the above procedure will leave blobs in the repository that are no longer reachable in the history. Delete the blobs using the following
rm -rf .git/refs/original/
git reflog expire --expire=now --all
git fsck --full --unreachable
git repack -A -d
git gc --aggressive --prune=now