Skip to content

Instantly share code, notes, and snippets.

@maxandersen
Forked from nickboldt/SVN To GitHub Migration
Created September 13, 2012 07:50
Show Gist options
  • Save maxandersen/3712693 to your computer and use it in GitHub Desktop.
Save maxandersen/3712693 to your computer and use it in GitHub Desktop.
SVN To GitHub Migration
#!/usr/bin/env python
import sys
from git_fast_filter import Blob, Reset, FileChanges, Commit, FastExportFilter
from git_fast_filter import get_commit_count, get_total_objects
if len(sys.argv) != 3:
raise SystemExit("Syntax:\n %s SOURCE_REPO TARGET_REPO")
source_repo = sys.argv[1]
target_repo = sys.argv[2]
total_objects = get_total_objects(source_repo) # blobs + trees
total_commits = get_commit_count(source_repo)
object_count = 0
commit_count = 0
def print_progress():
global object_count, commit_count, total_objects, total_commits
print "\rRewriting commits... %d/%d (%d objects)" \
% (commit_count, total_commits, object_count),
def my_blob_callback(blob):
global object_count
object_count += 1
print_progress()
def my_commit_callback(commit):
global commit_count
commit_count += 1
print_progress()
new_file_changes = []
if not commit.branch.endswith("/dead"):
for change in commit.file_changes:
print commit.branch + "->" + change.filename
if change.filename.startswith('hibernate'):
new_file_changes.append(change)
else:
print "Skipped " + commit.branch
commit.file_changes = new_file_changes
filter = FastExportFilter(blob_callback = my_blob_callback,
commit_callback = my_commit_callback)
filter.run(source_repo, target_repo)
== Migration from SVN to Git ==
This operation will require a fair amount of disk. Each copy of the overall jbosstools-svn-mirror git repo is about 2.6Gb.
After the migration, Freemarker was only 4.4Mb.
1. Check out entire git repo (using the readonly url so you cannot push back to origin by accident):
git clone git://github.com/jbosstools/jbosstools-svn-mirror.git
2. Backup local repo for reuse later
tar -czvf jbosstools-svn-mirror.tar.gz jbosstools-svn-mirror
To get a clean copy (~2.5 mins vs. much longer for rsync which I just killed :):
tar -xvf jbosstools-svn-mirror.tar.gz
3. Enter the new copy folder, and clean out all but the content we care about
cd jbosstools-svn-mirror
# delete the branches we are not interested in
# delete hibernate experiments
git branch -a | grep hibernatetoo | xargs git branch -r -d
# delete "dead" branch
git branch -a | grep dead | xargs git branch -r -d
# delete "smooks" branch
git branch -a | grep dead | xargs git branch -r -d
# filter out all the dirs we don't care about
# If you see "Cannot rewrite branch(es) with a dirty working directory." try cleaning out any uncommitted local changes
# Install http://gitorious.org/git_fast_filter
git clone git://gitorious.org/git_fast_filter/mainline.git
copy jbosstools_filter.py from this gist (currently just filter out anything not related to hiberntetools)
$ MYPATH=/PATH/TO/DIR/CONTAINING/git_fast_filter.py
$ export PYTHONPATH=$MYPATH:$PYTHONPATH
# create new repo
mkdir hibernatetools
cd hibernatetools
git init
python jbosstools_filter.py ../jbosstools-svn-mirror ../hibernatetools
# hibernatetools now have all history for dirs with hibernatetools in them
# takes up 3.5 GB's!!!
# garbage collect
git reset --hard
# HEAD is now at [some commitid] [some commit message]
git for-each-ref --format="%(refname)" refs/original/ | xargs -n 1 git update-ref -d
git reflog expire --expire=now --all
# this might take >3 mins
git gc --aggressive --prune=now
# Counting objects: 252576, done.
# Delta compression using up to 8 threads.
# Compressing objects: 100% (194466/194466), done.
# Writing objects: 100% (252576/252576), done.
# Total 252576 (delta 140467), reused 107773 (delta 0)
# Removing duplicate objects: 100% (256/256), done.
# now the repo is 155M (MUCH more managable and apropriate because it contains tons of jars)
### NEXT STEPS TBD ###
# prevent .project, bin, and *.class files in root from being committed
echo bin >> .gitignore
echo "*.class" >> .gitignore
echo .project >> .gitignore; git add .gitignore ; git commit -m ".gitignore file" .gitignore
4. Create new remote repo in github, eg., called "forge"
https://github.com/new
5. Connect local to new remote repo in github
git remote add origin ssh://[email protected]/nickboldt/forge.git
git push origin trunk # this could take a while to complete - might want to do this with EGit visually instead?
git pull origin trunk
6. Switch to master, checkout README.md, update README.md, commit & push
git pull origin master
git checkout master; ls -l
echo .project >> .gitignore; git add .gitignore ; git commit -m ".gitignore file" .gitignore
vi README.md
git commit -m "update README.md" README.md; git push origin master
9. Switch back to trunk
git checkout trunk; ls -l
git pull origin trunk
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment