This solution is a combination of two sources
- http://naleid.com/blog/2012/01/17/finding-and-purging-big-files-from-git-history
- http://stackoverflow.com/questions/24693360/git-large-pack-file-delete-and-rewrite
First, create list of all files tracked in the repository
git rev-list --objects --all | sort -k 2 > allfileshas.txt
Get the SHA for all committed files and sort them from biggest to smallest
git gc && git verify-pack -v .git/objects/pack/pack-*.idx | \
egrep "^\w+ blob\W+[0-9]+ [0-9]+ [0-9]+$" | \
sort -k 3 -n -r > bigobjects.txt
Extract the file names from bigobjects.txt
for all the large, and write to a file
for SHA in `cut -f 1 -d\ < bigobjects.txt`; do
echo $(grep $SHA bigobjects.txt) $(grep $SHA allfileshas.txt) | \
awk '{print $1,$3,$7}' >> bigtosmall.txt
done;
Now, look at the bigtosmall.txt
file, and keep only the file names that should be deleted from the history.
Look at each of the files listed in bigtosmall.txt
, and remove them from
the repository history. This is a destructive change!
for MY_BIG in $(cat bigtosmall.txt) ; do
echo $MY_BIG
git filter-branch -f \
--prune-empty \
--index-filter "git rm -rf --cached --ignore-unmatch $MY_BIG" \
--tag-name-filter cat -- --all
done
Sometimes, the above procedure will leave blobs in the repository that are no longer reachable in the history. Delete the blobs using the following
rm -rf .git/refs/original/
git reflog expire --expire=now --all
git fsck --full --unreachable
git repack -A -d
git gc --aggressive --prune=now