Skip to content

Instantly share code, notes, and snippets.

@barrysteyn
Last active October 10, 2024 12:27
Show Gist options
  • Save barrysteyn/2ba947313e0a4ad086c3 to your computer and use it in GitHub Desktop.
Save barrysteyn/2ba947313e0a4ad086c3 to your computer and use it in GitHub Desktop.
Migrate From SVN To GIT

Migrating From SVN to Git

This gist details the following:

  1. Converting a Subversion (SVN) repository into a Git repository
  2. Purging the resultant Git repository of large files

Migrating from SVN to Git is roughly split into three steps:

  1. Retrieve a list of SVN commit usernames
  2. Match SVN usernames to email addresses
  3. Migrate to Git using git-svn clone command

Step 1: Retrieve A List Of SVN Commit Usernames

A SVN commit only lists a user's username. Git on the other hand lists much more details, but at the very least, a git commit author needs both a username and an email address associated to that username. Since the email address is not available in SVN, it needs to be manually matched.

A list of usernames as recorded by SVN therefore needs to be created for the match. The following command will result in a file called authors.txt which will have the SVN usernames as its contents:

svn log -q | awk -F '|' '/^r/ {sub("^ ", "", $2); sub(" $", "", $2); print $2" = "$2" <"$2">"}' | sort -u > authors.txt

Step 2: Match SVN usernames to email addresses

The contents of authors.txt is in the following format:

jwilkins = jwilkins <jwilkins>

It needs to be converted into this:

jwilkins = John Albin Wilkins <[email protected]>

Step 3: Migrate To Git Using git-svn clone Command

Create a folder where the git clone is to be stored, and then do the following:

git svn clone --stdlayout --authors-file=path/to/authors.txt <svn_repo>

This last step may take some time, but it should result in a Git repo.

##Find And Purge Large Files From Git History

Git (at least GitHub) seems to be stricter than SVN regarding large files. In order to migrate a SVN repository to Git, one may need to purge these files from the Git history.

Step 1: Determine The Files That Are Large

Go to newly created Git repo and do the following:

git rev-list --objects --all | sort -k 2 > allfileshas.txt;git gc && git verify-pack -v .git/objects/pack/pack-*.idx | egrep "^\w+ blob\W+[0-9]+ [0-9]+ [0-9]+$" | sort -k 3 -n -r > bigobjects.txt

This will result in two files:

  1. allfileshas.txt - a list of all sha's in the git repo
  2. bigobjects.txt - a list of sha's representing objects that are large

To transform these two files into a list of file names and sorted by size in descending order:

for SHA in `cut -f 1 -d\  < bigobjects.txt`; do echo $(grep $SHA bigobjects.txt) $(grep $SHA allfileshas.txt) | awk '{print$1,$3,$7}' >> bigtosmall.txt; done

NOTE: The above script may take a long time (and may never stop), so after 2 minutes (max), just ctr-c stop it.

The resulting file, bigtosmall.txt will contain a list of file names, sorted from largest to smallest.

Step 2: Purge The Files From The Git History

Select files (or even a directory) from bigtosmall.txt that you want purged. Then run the following for each file, substituing MY-BIG-DIRECTORY-OR-FILE with the directory or file that is to be purged:

git filter-branch -f --prune-empty --index-filter 'git rm -rf --cached --ignore-unmatch MY-BIG-DIRECTORY-OR-FILE' --tag-name-filter cat -- --all
@tarrynn
Copy link

tarrynn commented Mar 8, 2018

worked by doing the first 3 steps. nice one!

@Mexicoder
Copy link

Mexicoder commented Jul 10, 2018

For anyone with issues with cmd not recognizing "awk" go here: http://gnuwin32.sourceforge.net/packages/gawk.htm.
download the setup you want and install.
Now you need to Update your PATH variable. the dir you need should be "C:\Program Files (x86)\GnuWin32\bin"
Here is the stack post i followed to do it: https://stackoverflow.com/a/21930462/5919289

@jackle1990
Copy link

Work for me. Thanks a lot!

@MortInfinite
Copy link

When I run the following command on Windows 10:
svn log -q | awk -F '|' '/^r/ {sub("^ ", "", $2); sub(" $", "", $2); print $2" = "$2" <"$2">"}' | sort -u > authors.txt

I receive the error message:
''' is not recognized as an internal or external command, operable program or batch file.

I have both svn and awk in my PATH variable.

@dbfeatdb
Copy link

dbfeatdb commented Dec 7, 2019

When I run the following command on Windows 10:
svn log -q | awk -F '|' '/^r/ {sub("^ ", "", $2); sub(" $", "", $2); print $2" = "$2" <"$2">"}' | sort -u > authors.txt

I receive the error message:
''' is not recognized as an internal or external command, operable program or batch file.

I have both svn and awk in my PATH variable.

An alternative approach for Step 1: Retrieve A List Of SVN Commit Usernames, that uses powershell, can be found here https://docs.microsoft.com/en-us/azure/devops/repos/git/perform-migration-from-svn-to-git?view=azure-devops

@stephanecharette
Copy link

Cause I know I'll run into this again, here are my notes...

  1. Create the authors file as described above, one user per line.
  2. Create an empty github repo, Foo in this example.
  3. Then on the local system:
sudo apt-get install git-svn
git svn clone --no-metadata --authors-file=/path/to/authors.txt svn://svnaddrorname/path/to/project/Foo/ Foo
cd Foo
git remote add origin [email protected]:username/Foo.git
git push --set-upstream origin master

@j1m1l0k0
Copy link

j1m1l0k0 commented Aug 13, 2021

Tried to follow the guide but got stock in the first step. While trying to create the list of commiters, I get this:

awk : The term 'awk' is not recognized as the name of a cmdlet, function, script file, or operable program. Check the
spelling of the name, or if a path was included, verify that the path is correct and try again.
At line:1 char:14
+ svn log -q | awk -F '|' '/^r/ {sub("^ ", "", $2); sub(" $", "", $2);  ...
+              ~~~
    + CategoryInfo          : ObjectNotFound: (awk:String) [], CommandNotFoundException
    + FullyQualifiedErrorId : CommandNotFoundException

use command:
svn log -q http://ip/svn/repo_name | awk -F '|' '/^r/ {sub("^ ", "", $2); sub(" $", "", $2); print $2" = "$2" <"$2">"}' | sort -u > authors.txt

@j1m1l0k0
Copy link

j1m1l0k0 commented Aug 13, 2021

script shell for convert users from base

#!/bin/bash
url_svn_addr=$1
authors_filename=$2
svn log -q ${url_svn_addr} | awk -F '|' '/^r/ {sub("^ ", "", $2); sub(" $", "", $2); print $2" = "$2" <"$2"@localhost"">"}' | sort -u >> ${authors_filename}

use: script.sh http://127.0.0.1/svn/repotest repotest.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment