Skip to content

Instantly share code, notes, and snippets.

@rahulbhadani
Last active September 26, 2020 06:17
Show Gist options
  • Save rahulbhadani/af73edb500f033dd44ee9326307d2078 to your computer and use it in GitHub Desktop.
Save rahulbhadani/af73edb500f033dd44ee9326307d2078 to your computer and use it in GitHub Desktop.

Migrating from SVN to GitHub

This gist explains the detail of migrating from SVN to Git. I have had a significant portion of my research work, paper, and source code in Subversion (SVN) since 2015. Recently we decided to migrate everything to GitHub private repository. I am detailing the steps below in case that will be helpful.

Checkout your code from SVN

svn co https://svn.engr.arizona.edu/svn/jms/jmsgroup/trunk/wildcat wildcat_SVN

This will save the subversion code in wildcat_SVN. Run following inside wildcat_SVN directory to get the list of author information who have committed so far.

cd wildcat_SVN
svn log -q | awk -F '|' '/^r/ {sub("^ ", "", $2); sub(" $", "", $2); print $2" = "$2" <"$2">"}' | sort -u > authors-transform.txt

This will save the author information in the file authors-transform.txt. In my case following were the content for authors-transform.txt:

wildcat = wildcat <wildcat>
wilma = wilma <wilma>

In order to map commits to an existing GitHub user, I edited authors-transform.txt to have the following content:

wildcat = Wilma Wildcat <[email protected]>
wilma = Wilbur Wildcat <[email protected]>

Now, we will use git-svn to perform svn checkout which will checkout svn repo as a git repo. First, we need to install git-svn

sudo apt-get install git-svn

Then we checkout repo using git svn:

cd ..
git svn clone https://svn.engr.arizona.edu/svn/jms/jmsgroup/trunk/wildcat - authors-file=wildcat_SVN/authors-transform.txt wildcat_GIT

This will take a while, so grab a cup of coffee, enjoy, watch an episode of your favorite Korean drama and then come back later. Now we need to move svn ignore to git ignore:

cd wildcat_GIT
git svn show-ignore > .gitignore

Then we need convert all of the SVN tags into the proper Git tags. We can run the following command to do so:

for t in $(git for-each-ref - format='%(refname:short)' refs/remotes/tags); do git tag ${t/tags\//} $t && git branch -D -r $t; done

We can also create a local branch for each of our remote refs. You can do so with the following command:

for b in $(git for-each-ref - format='%(refname:short)' refs/remotes); do git branch $b refs/remotes/$b && git branch -D -r $b; done

Now, at this point, we need to take care of large file > 100 MB as Github, by default, doesn't automatically accept large files. For that, I need to install git lfs. Download git lfs

cd ..
wget https://github.com/git-lfs/git-lfs/releases/download/v2.12.0/git-lfs-linux-amd64-v2.12.0.tar.gz
tar -xzvf git-lfs-linux-amd64-v2.12.0.tar.gz -C ./git-lfs
cd git-lfs
sudo ./install.sh

Once we have installed git lfs, we go back to our git-svn folder

cd ../wildcat_GIT

In my case, .fig files are > 100 MB, so I can use the following command to tell GIT LFS to manage:

git lfs track "*.fig"
git add .gitattributes

But if you have only a couple of files > 100 MB then you can specify them directly. In my case, let say figures/gradient.fig > 100 MB, then I will specify it as follows:

git lfs track "*.figures/gradient.fig"
git add .gitattributes

You can run following command to get files > 100 MB in your git directory:

find ./ -size +99M

Once we have this ready. We should be able to commit pretty soon. We need to take care of one more thing. Since in my case, subversion had a lot of commits, so I needed to commit in steps. I couldn't commit all at once. In this case, I did the following steps: First, I got all the commit logs as follows:

git log > all_commits.txt

Then I parsed the all_commits.txt file to get the list of commits hashes as follows:

awk -F'commit ' '{print $2}' all_commits.txt > commitsHash.txt

Since commits hashes in the file had newest commit has at the top and the oldest at the bottom, I needed to reverse that. I performed following sequence to save commit hashes in reverse order:

sed '1!G;h;$!d' commitsHash.txt > Reverse_commitsHas.txt

Now, create a git repo in your GitHub.com account. I created a Github repo at https://github.com/catgroup/wildcat Then I added this as remote url to wildcat_GIT folder

git remote add origin https://github.com/catgroup/wildcat

Now, I wrote following bash script to commit sequentially from oldest commit to latest one:

#!/bin/bash
filename="Reverse_commitsHas.txt"
while IFS= read -r line
do
 length=${#line}
 if [[ "$length" == 0 ]];then
 continue
 fi
git push origin - force "$line":refs/heads/master
 echo "Committed hastag $line"
done < "$filename"

Save this file as push_to_github.sh. Change its mode to be executable

chmod +x push_to_github.sh

Then execute the bash script:

./push_to_github.sh

This will take a while so go grab a cup of coffee or watch an episode of your favorite Korean drama and come back later This will start pushing everything. Now you also want to push tags which you will do as follows:

git push origin  --tags

You are now officially done. Check your repo at Github.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment