Last active
August 8, 2024 11:53
-
-
Save pozorvlak/8784840 to your computer and use it in GitHub Desktop.
Anonymise Git history
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/sh | |
# Suppose you want to do blind reviewing of code (eg for job interview | |
# purposes). Unfortunately, the candidates' names and email addresses are | |
# stored on every commit! You probably want to assess each candidate's version | |
# control practices, so just `rm -rf .git` throws away too much information. | |
# Here's what you can do instead. | |
# Rewrite all commits to hide the author's name and email | |
for branch in `ls .git/refs/heads`; do | |
# We may be doing multiple rewrites, so we must force subsequent ones. | |
# We're throwing away the backups anyway. | |
git filter-branch -f --env-filter ' | |
export GIT_AUTHOR_NAME="Anonymous Candidate" | |
export GIT_AUTHOR_EMAIL="[email protected]"' $branch | |
done | |
# Delete the old commits | |
rm -rf .git/refs/original/ | |
# Delete remotes, which might point to the old commits | |
for r in `git remote`; do git remote rm $r; done | |
# Your old commits will now no longer show up in GitK, `git log` or `git | |
# reflog`, but can still be found using `git show $commit-id`. |
Publishing a summarized version:
#!/bin/sh
# Suppose you want to do blind reviewing of code (eg for job interview
# purposes). Unfortunately, the candidates' names and email addresses are
# stored on every commit! You probably want to assess each candidate's version
# control practices, so just `rm -rf .git` throws away too much information.
# Here's what you can do instead.
# Rewrite all commits to hide the author's name and email
for branch in `ls .git/refs/heads`; do
# We may be doing multiple rewrites, so we must force subsequent ones.
# We're throwing away the backups anyway.
git filter-branch -f --env-filter '
export GIT_AUTHOR_NAME="Anonymous Candidate"
export GIT_AUTHOR_EMAIL="[email protected]"
export GIT_COMMITTER_NAME="Anonymous Candidate"
export GIT_COMMITTER_EMAIL="[email protected]"
' $branch
done
# Delete the old commits
rm -rf .git/refs/original/
# Delete remotes, which might point to the old commits
for r in `git remote`; do git remote rm $r; done
# Delete references
git reflog expire --expire=90.days.ago --expire-unreachable=now --all
# Your old commits will now no longer show up in GitK, `git log` or `git
# reflog`, but can still be found using `git show $commit-id`.
# Be aware that merge commit messages often include URLs hinting the original author
Almost perfect! But this script doesnt anonimize commits on tags! For that you have to replace the whole for git filter-branch loop with something like https://github.com/adamdehaven/change-git-author/blob/master/changeauthor.sh#L536, that is:
git filter-branch -f --env-filter '... exports ...' --tag-name-filter cat -- --branches --tags
Now yes!
(correction script taken from https://www.adamdehaven.com/blog/update-commit-history-author-information-for-git-repository/#instructions)
To cleanup the merge commit messages, I've used:
for branch in `ls .git/refs/heads`; do
git filter-branch -f --msg-filter 'sed "s/Merge pull request.*$/Merge pull request #xxx from anonymous_repo/g"' $branch
done
Also be aware of merge branch commits, sometimes branch names includes people usernames:
for branch in `ls .git/refs/heads`; do
git filter-branch -f --msg-filter 'sed "s/Merge branch.*$/Merge branch anon into anon/g"' $branch
done
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
the repository name still appears if I made a git clone when trying a 'git reflog'
This can solve the issue: