Skip to content

Instantly share code, notes, and snippets.

@mrchnk
Forked from barrysteyn/svn-to-git.md
Created March 19, 2018 10:20
Show Gist options
  • Save mrchnk/949b3d0bbf1dd0c43adeddf8800fe278 to your computer and use it in GitHub Desktop.
Save mrchnk/949b3d0bbf1dd0c43adeddf8800fe278 to your computer and use it in GitHub Desktop.
Migrate From SVN To GIT

Migrating From SVN to Git

This gist details the following:

  1. Converting a Subversion (SVN) repository into a Git repository
  2. Purging the resultant Git repository of large files

Migrating from SVN to Git is roughly split into three steps:

  1. Retrieve a list of SVN commit usernames
  2. Match SVN usernames to email addresses
  3. Migrate to Git using git-svn clone command

Step 1: Retrieve A List Of SVN Commit Usernames

A SVN commit only lists a user's username. Git on the other hand lists much more details, but at the very least, a git commit author needs both a username and an email address associated to that username. Since the email address is not available in SVN, it needs to be manually matched.

A list of usernames as recorded by SVN therefore needs to be created for the match. The following command will result in a file called authors.txt which will have the SVN usernames as its contents:

svn log -q | awk -F '|' '/^r/ {sub("^ ", "", $2); sub(" $", "", $2); print $2" = "$2" <"$2">"}' | sort -u > authors.txt

Step 2: Match SVN usernames to email addresses

The contents of authors.txt is in the following format:

jwilkins = jwilkins <jwilkins>

It needs to be converted into this:

jwilkins = John Albin Wilkins <[email protected]>

Step 3: Migrate To Git Using git-svn clone Command

Create a folder where the git clone is to be stored, and then do the following:

git svn clone --stdlayout --authors-file=path/to/authors.txt <svn_repo>

This last step may take some time, but it should result in a Git repo.

##Find And Purge Large Files From Git History

Git (at least GitHub) seems to be stricter than SVN regarding large files. In order to migrate a SVN repository to Git, one may need to purge these files from the Git history.

Step 1: Determine The Files That Are Large

Go to newly created Git repo and do the following:

git rev-list --objects --all | sort -k 2 > allfileshas.txt;git gc && git verify-pack -v .git/objects/pack/pack-*.idx | egrep "^\w+ blob\W+[0-9]+ [0-9]+ [0-9]+$" | sort -k 3 -n -r > bigobjects.txt

This will result in two files:

  1. allfileshas.txt - a list of all sha's in the git repo
  2. bigobjects.txt - a list of sha's representing objects that are large

To transform these two files into a list of file names and sorted by size in descending order:

for SHA in `cut -f 1 -d\  < bigobjects.txt`; do echo $(grep $SHA bigobjects.txt) $(grep $SHA allfileshas.txt) | awk '{print$1,$3,$7}' >> bigtosmall.txt; done

NOTE: The above script may take a long time (and may never stop), so after 2 minutes (max), just ctr-c stop it.

The resulting file, bigtosmall.txt will contain a list of file names, sorted from largest to smallest.

Step 2: Purge The Files From The Git History

Select files (or even a directory) from bigtosmall.txt that you want purged. Then run the following for each file, substituing MY-BIG-DIRECTORY-OR-FILE with the directory or file that is to be purged:

git filter-branch -f --prune-empty --index-filter 'git rm -rf --cached --ignore-unmatch MY-BIG-DIRECTORY-OR-FILE' --tag-name-filter cat -- --all
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment