I recently had to remove all traces of a file from a GIT repository.
First clone the repository:
git clone ssh://[email protected]/home/repositories/codebase codebase
Then change directory to the top level of the clone:
cd ./codebase
Remove the file using filter-branch:
git filter-branch --prune-empty -f --index-filter \
'git rm -f --cached --ignore-unmatch tools/java/foo.jar' \
--tag-name-filter cat -- --all
You can also remove entire directories (note that 'rm -f' changes to 'rm -rf'):
git filter-branch --prune-empty -f --index-filter \
'git rm -rf --cached --ignore-unmatch tools/java/' \
--tag-name-filter cat -- --all
Then update the references:
git for-each-ref --format='delete %(refname)' refs/original \
| git update-ref --stdin
The --stdin parameter is not supported in git 1.x, so you would use:
git for-each-ref --format='delete %(refname)' refs/original \
| xargs --verbose --max-lines=1 git update-ref
Expire the reference log:
git reflog expire --expire=now --all
Collect garbage and prune:
git gc --prune=now
Push the update and the tags:
git push --force --all
git push --force --tags
Finally you will need to collect garbage and prune the repository itself:
cd /home/repositories/codebase
git gc --prune=now
Matthias Sohn has a very useful script git-big-objects.sh to show n largest objects in a git repo's pack files.