💡 Public Service Announcement: Editing a git repos history will cause the repo hashes to be recomputed, resulting in a divergence in remote refs/branches. A fetch followed by a force push will be required to update remotes.
This can wreak havoc on a project if contributors and maintainers are not "in the know".
Please think carefully about the implications for public repos and repos that you collaborate on with others.
I strongly encourage you to announce, in good time, that you intend to proceed with making such changes.
The anatomy of a git repo has various facets, including:
- The file content
- The byte length of file content
- Commit details, including:
- Authorship + timestamp
- Committer + timestamp
- The commit msg
As a side note, git uses some of these facets to derive hashes, but that is beyond the scope of this gist.
You could ask a chat prompt like https://duck.ai "How does git derive hashes?" for more details.
Back on topic, there are times when it is necessary to edit any of these facets and effectively rewrite the repos history and contents. For example:
- You have a repo you are publishing but you want to make some edits beforehand
- You've spotted something in the history/content that is just wrong and justifies an edit
- You want to easily bulk modify author or committer details
- You want to easily bulk modify some timestamps
Modern git has some great tools in its belt for this. Introducing:
This program dumps the given revisions in a form suitable to be piped into
git fast-import
.You can use it as a human-readable bundle replacement (see
git-bundle
), or as a format that can be edited before being fed togit fast-import
in order to do history rewrites (an ability relied on by tools likegit filter-repo
).
The man page synopsis of git fast-import
is a bit daunting, so I'll skip it.
What we need to know is that if you modify an export file correctly, it can be imported with git fast-import
.
AFAIK, there are no constraints on changing commit details such as author, committer and their respective timestamps, and the commit message.
However, if you change the file contents and the number of bytes changes for a given file/object, then you need to update the corresponding byte count in the export. Otherwise the import will fail or produce unexpected results.
Modifying byte counts is tedious and error prone. For more complex changes to file content, you'll want to look at git-filter-repo
, which is covered later in this gist.
It is always a good idea to copy the repo before making changes, and then use a diff tool like Beyond-Compare to check that the diffs are as intended after an import.
Keep in mind that an export is effectively a byte stream of part of, or all of a repo.
The export could contain file contents with Windows or Unix line endings. So it is possible for an export file to be a mixed line-endings file.
Some text editors don't handle this very well, a surprising example being Sublime, which I posted about on their issue tracker here. In Sublime's case, it normalises the line-endings and this effectively corrupts the export when the file is saved.
vim
, on the other hand, works as one would expect, modifying only the bytes that the user edits.
In this example, the full repo is exported, edited and re-imported:
- Take a copy of the repo, acts as a backup and diff reference
Git repos are just a collection of filesystem objects.
A simple copy is typically enough, for example:
cp -a repo repo~before-edits
cd
into the repogit fast-export --all > ../repo.git.export
Best to write the file outside of the repo- Make the required changes observing the mentioned constraints and gotcha's
git fast-import < ../repo.git.export
Voilà, the repo has been re-written based on your edits. Time to run a diff and check everything went as planned.
A useful git log
command that can also be saved to file and diffed before and after is:
git log --pretty=format:"%H %an <%ae> %cn <%ce> %ad %cd %s" --date=iso |less -S
Another diff that can be insightful: if you make a backup of the export before editing (or export from the repo backup), you can perform another export after the import, and diff the before and after export files.
If the repo had remote refs/branches, then a fetch is required to reconcile the divergence followed by a force push. Please note the PSA I wrote at the top of this gist. Pushing a repo modified with these approaches causes divergence and should be considered a destructive operation. Please consider notifying contributors and maintainers in advance.
Note that git-filter-repo
is officially recommended over / as the successor to git filter-branch
.
I'll write a real-world example at some point but for now here are some references to get you started with this powerful tool.
What is a use case for git-filter-repo
?
Personally, I've used it to easily rename a file in a repo throughout its history, but it can do a lot more. Check out the docs.