Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save kyle0r/0076b0dd0da7e7fbd66b6af235f21b26 to your computer and use it in GitHub Desktop.
Save kyle0r/0076b0dd0da7e7fbd66b6af235f21b26 to your computer and use it in GitHub Desktop.
HOWTO easily edit any part of git repos history, file content, commits, authors, timestamps etc - from tail to tip

💡 Public Service Announcement: Editing a git repos history will cause the repo hashes to be recomputed, resulting in a divergence in remote refs/branches. A fetch followed by a force push will be required to update remotes.
This can wreak havoc on a project if contributors and maintainers are not "in the know".
Please think carefully about the implications for public repos and repos that you collaborate on with others.
I strongly encourage you to announce, in good time, that you intend to proceed with making such changes.


The anatomy of a git repo has various facets, including:

  • The file content
    • The byte length of file content
  • Commit details, including:
    • Authorship + timestamp
    • Committer + timestamp
    • The commit msg

As a side note, git uses some of these facets to derive hashes, but that is beyond the scope of this gist.
You could ask a chat prompt like https://duck.ai "How does git derive hashes?" for more details.

Back on topic, there are times when it is necessary to edit any of these facets and effectively rewrite the repos history and contents. For example:

  • You have a repo you are publishing but you want to make some edits beforehand
  • You've spotted something in the history/content that is just wrong and justifies an edit
  • You want to easily bulk modify author or committer details
  • You want to easily bulk modify some timestamps

Modern git has some great tools in its belt for this. Introducing:

git fast-export

This program dumps the given revisions in a form suitable to be piped into git fast-import.

You can use it as a human-readable bundle replacement (see git-bundle), or as a format that can be edited before being fed to git fast-import in order to do history rewrites (an ability relied on by tools like git filter-repo).

git fast-import

The man page synopsis of git fast-import is a bit daunting, so I'll skip it.
What we need to know is that if you modify an export file correctly, it can be imported with git fast-import.

Constraints when editing an export

AFAIK, there are no constraints on changing commit details such as author, committer and their respective timestamps, and the commit message.

However, if you change the file contents and the number of bytes changes for a given file/object, then you need to update the corresponding byte count in the export. Otherwise the import will fail or produce unexpected results.

Modifying byte counts is tedious and error prone. For more complex changes to file content, you'll want to look at git-filter-repo, which is covered later in this gist.

It is always a good idea to copy the repo before making changes, and then use a diff tool like Beyond-Compare to check that the diffs are as intended after an import.

Gotcha's when editing an export

Keep in mind that an export is effectively a byte stream of part of, or all of a repo.
The export could contain file contents with Windows or Unix line endings. So it is possible for an export file to be a mixed line-endings file.

Some text editors don't handle this very well, a surprising example being Sublime, which I posted about on their issue tracker here. In Sublime's case, it normalises the line-endings and this effectively corrupts the export when the file is saved.

vim, on the other hand, works as one would expect, modifying only the bytes that the user edits.

Typical workflow

In this example, the full repo is exported, edited and re-imported:

  1. Take a copy of the repo, acts as a backup and diff reference
    Git repos are just a collection of filesystem objects.
    A simple copy is typically enough, for example:
    cp -a repo repo~before-edits
  2. cd into the repo
  3. git fast-export --all > ../repo.git.export
    Best to write the file outside of the repo
  4. Make the required changes observing the mentioned constraints and gotcha's
  5. git fast-import < ../repo.git.export

Voilà, the repo has been re-written based on your edits. Time to run a diff and check everything went as planned.

A useful git log command that can also be saved to file and diffed before and after is:

git log --pretty=format:"%H %an <%ae> %cn <%ce> %ad %cd %s" --date=iso  |less -S

Another diff that can be insightful: if you make a backup of the export before editing (or export from the repo backup), you can perform another export after the import, and diff the before and after export files.

If the repo had remote refs/branches, then a fetch is required to reconcile the divergence followed by a force push. Please note the PSA I wrote at the top of this gist. Pushing a repo modified with these approaches causes divergence and should be considered a destructive operation. Please consider notifying contributors and maintainers in advance.

git-filter-repo

Note that git-filter-repo is officially recommended over / as the successor to git filter-branch.

I'll write a real-world example at some point but for now here are some references to get you started with this powerful tool.

What is a use case for git-filter-repo?

Personally, I've used it to easily rename a file in a repo throughout its history, but it can do a lot more. Check out the docs.

References

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment