Skip to content

Instantly share code, notes, and snippets.

@mcmire
Created February 1, 2010 04:10
Show Gist options
  • Save mcmire/291448 to your computer and use it in GitHub Desktop.
Save mcmire/291448 to your computer and use it in GitHub Desktop.
Utilities for converting an SVN repository with an inconsistent layout to Git.
#!/usr/bin/env perl
#
# A reimplementation of James Coglan's svn2git Ruby script as a Perl
# module and accompanying executable, adapted from
# http://github.com/schwern/svn2git by Elliot Winkler.
#
# This script delegates all the hard work to SvnToGit.pm (included
# in this bundle) so see that for more.
#
use strict;
use warnings;
use Getopt::Long;
use Pod::Usage;
use File::Basename;
use lib dirname(__FILE__);
use SvnToGit;
my %opts;
GetOptions(
\%opts,
"trunk:s", "branches:s", "tags:s", "root-is-trunk",
"authors:s",
"clone!",
"strip-tag-prefix:s",
"revision|r:s",
"verbose|v",
"help|h"
) or pod2usage(2);
pod2usage(1) if $opts{help};
# convert --foo-bar to $ARGV{foo_bar}
for my $k (keys %opts) {
my $v = $opts{$k};
$k =~ s/-/_/g;
$ARGV{$k} = $v;
}
pod2usage(2) if @ARGV == 0;
my($svn_repo, $git_repo) = @ARGV;
$ARGV{svn_repo} = $svn_repo;
$ARGV{git_repo} = $git_repo;
SvnToGit->convert(%ARGV);
=head1 NAME
svn2git - Convert a Subversion repository to Git
=head1 SYNOPSIS
svn2git [OPTIONS] SVN_URL [GIT_REPO_DIR]
OPTIONS:
--trunk TRUNK_PATH
--branches BRANCHES_PATH
--tags TAGS_PATH
--root-is-trunk
--no-clone
--revision REV_OR_REVS
--authors AUTHORS_FILE
--strip-tag-prefix
--verbose, -v
=head1 DESCRIPTION
svn2git converts a Subversion project into a git repository. It uses
git-svn to do the bulk of the conversion, but does a little extra work
to convert the Subversion way of doing things into the git way:
* Subversion tag branches become git tags.
* Local branches are made for each remote Subversion branch.
* master is assured to be trunk.
Once done, your new repository is ready to be used. You can push it
to a new remote origin like so...
git remote add origin <git_remote_url>
git push origin --all
git push origin --tags
=head2 Switches
=head3 --trunk TRUNK_PATH
=head3 --branches BRANCHES_PATH
=head3 --tags TAGS_PATH
These tell svn2git about the layout of your repository; what subdirs
contain the trunk, branches and tags respectively. If none are given
a standard Subversion layout is assumed.
=head3 --root-is-trunk
This tells svn2git that trunk is at 'trunk', and not to worry about
the branches or tags (except for converting trunk to the master branch).
=head3 --no-clone
Skip the step of cloning the SVN repository. This is useful when you
just want to convert the tags on a git repository you'd previously
cloned using git-svn. This assumes you're already in the git repo.
=head3 --revision REV
=head3 --revision REV1:REV2
=head3 -r REV
=head3 -r REV1:REV2
Specifies which revision(s) to fetch, when running C<git svn fetch>.
=head3 --authors AUTHORS_FILE
The location of the authors file to use for the git-svn clone. See
L<git-svn>'s -A option for details.
=head3 --strip-tag-prefix
A prefix to strip off all tags. For example,
C<<--strip-tag-prefix=release->> would turn "release-1.2.3" into
"1.2.3".
=head3 --verbose
=head3 -v
If either -v or --verbose is given, svn2git will output each command
before it runs it.
=head1 EXAMPLES
Convert an SVN project with a standard structure, autocreating the
'some-project' directory:
svn2git http://svn.example.com/some-project
Convert an SVN project that doesn't have a trunk/branches/tags setup:
svn2git http://svn.example.com/some-project --root-is-trunk
Convert an SVN project with a custom path:
svn2git http://svn.example.com/some-project some-dir
Convert the tags on an existing git-svn project:
cd some-git-svn-project
svn2git --no-clone
=head1 AUTHOR
Michael G Schwern <[email protected]>
Modifications by Elliot Winkler <[email protected]>
=head1 SEE ALSO
L<git>, L<git-svn>
The original Perl script:
L<http://github.com/schwern/svn2git>
The original Ruby svn2git:
L<http://github.com/jcoglan/svn2git/>
=cut
#!/usr/bin/perl -w
#
# Author: Elliot Winkler
# Last updated: 31 Jan 2010
#
# Adapted from http://justatheory.com/computers/vcs/git/bricolage-migration/stitch
# See http://justatheory.com/computers/vcs/git/
#
=head1 NAME
svn2git_stitch.pl - Convert a SVN repo with a late-introduced
conventional directory structure to Git
=head1 SYNOPSIS
svn2git_stitch.pl [OPTIONS] SVN_URL [FINAL_GIT_REPO_DIR]
OPTIONS:
--grafts-file FILE
--pre-end REV
--post-start REV
--final-git-url GIT_URL
--authors AUTHORS_FILE, --authors-file AUTHORS_FILE
--clear-cache
--verbose, -V
=cut
use strict;
use warnings;
use Getopt::Long;
use Pod::Usage;
use File::Spec::Functions qw(rel2abs file_name_is_absolute);
use File::Basename;
use Term::ANSIColor;
use lib dirname(__FILE__);
use SvnToGit;
my %ARGV;
my($svn_url, $final_repo);
my($cached_pre_repo, $cached_post_repo) = ("/tmp/git.pre.cached", "/tmp/git.post.cached");
my($pre_repo, $post_repo) = ("/tmp/git.pre", "/tmp/git.post");
my(@pre_local_branches, @pre_remote_branches);
my(@post_local_branches, @post_remote_branches);
my(@final_local_branches, @final_remote_branches);
#---
sub run {
print colored("@_\n", "yellow") if $ARGV{verbose};
system @_;
my $exit = $? >> 8;
die "@_ exited with $exit" if $exit;
return 1;
}
sub cd {
my $dir = shift;
chdir $dir;
my $cwd = get_cwd();
info("Current directory: $cwd");
}
sub get_cwd {
my $cwd = `pwd`;
chomp $cwd;
$cwd;
}
sub header {
my $msg = shift;
print colored("\n##### $msg #####\n\n", "cyan");
}
sub info {
my $msg = shift;
print colored("$msg\n", "green");
}
sub get_local_branches {
my @branches = map { s/^\s*\*\s*//; s/^\s+//; s/\s+$//; $_ } `git branch`;
# bypass pointers
@branches = map { /^([^ ]+) ->/ ? $1 : $_ } @branches;
info("Local branches: " . join(", ", @branches));
@branches;
}
sub get_remote_branches {
my @branches = grep { !/HEAD/ } map { s/^\s+//; s/\s+$//; $_ } `git branch -r`;
# bypass pointers
@branches = map { /^([^ ]+) ->/ ? $1 : $_ } @branches;
info("Remote branches: " . join(", ", @branches));
@branches;
}
#---
sub process_command_line {
my %opts;
GetOptions(
\%opts,
"grafts-file=s",
"pre-end=i",
"post-start=i",
"authors-file|authors=s",
"final-git-url=s",
"clear-cache",
"verbose|V",
"help|h"
) or pod2usage(2);
pod2usage(1) if $opts{help};
# convert --foo-bar to $ARGV{foo_bar}
for my $k (keys %opts) {
my $v = $opts{$k};
$k =~ s/-/_/g;
$ARGV{$k} = $v;
}
pod2usage(2) unless length(@ARGV) >= 1;
$ARGV{final_git_url} ||= 'ssh://[email protected]/path/to/git/repo';
if ($ARGV{grafts_file}) {
$ARGV{grafts_file} = rel2abs($ARGV{grafts_file}) unless file_name_is_absolute($ARGV{grafts_file});
} else {
$ARGV{stop_at_grafting} = 1;
}
($svn_url, $final_repo) = @ARGV;
$final_repo ||= basename($svn_url) . ".git";
$final_repo = rel2abs($final_repo) unless file_name_is_absolute($final_repo);
}
sub main {
svn2git();
work_on_pre();
work_on_final();
my $final_repo_name = basename($final_repo, '.git') . '.git';
print <<EOT
----
Conversion complete! Now ssh into your server and run something like this:
su git
cd /path/to/git/repos
mkdir $final_repo_name
cd $final_repo_name
git init --bare
# Thanks <https://kerneltrap.org/mailarchive/git/2008/10/9/3569854/thread>
git config core.sharedrepository 1
chmod -R o-rwx .
chmod -R g=u .
find . -type d | xargs chmod g+s
The newly converted Git repo is available at $final_repo. To upload
it to your server, simply run:
cd $final_repo
git push origin --all
EOT
}
#---
sub svn2git {
header "Running SVN repo through svn2git";
if ($ARGV{clear_cache} || !(-e $cached_pre_repo && -e $cached_post_repo)) {
run qw(rm -rf), $cached_pre_repo, $cached_post_repo;
SvnToGit->convert(
svn_repo => $svn_url,
git_repo => $cached_pre_repo,
root_is_trunk => 1,
revisions => ($ARGV{pre_end} ? "1:$ARGV{pre_end}" : undef),
authors_file => $ARGV{authors_file},
verbose => $ARGV{verbose}
);
SvnToGit->convert(
svn_repo => $svn_url,
git_repo => $cached_post_repo,
revisions => ($ARGV{post_start} ? "$ARGV{post_start}:HEAD" : undef),
authors_file => $ARGV{authors_file},
verbose => $ARGV{verbose}
);
}
run qw(rm -rf), $pre_repo, $post_repo;
run "cp", "-r", $cached_pre_repo, $pre_repo;
run "cp", "-r", $cached_post_repo, $post_repo;
}
sub work_on_pre {
copy_commits_from_post_to_pre();
transfer_remote_branches_to_local_branches();
if ($ARGV{stop_at_grafting}) {
print <<EOT;
----
Okay! The first and second half of the SVN repo have been converted,
but you still need to graft them together.
Next steps:
1. Take a look at /tmp/git.pre and /tmp/git.post to find the commit ids
where the "pre-repo" starts and the "post-repo" ends
2. Create a graft file that will connect pre-repo to post-repo. It will
look like this:
<start of pre-repo id> <end of post-repo id>
You can read more about grafting here:
http://git.wiki.kernel.org/index.php/GraftPoint
3. Tell this script about the graft file with the --grafts-file option
EOT
exit;
}
graft_pre_and_post();
clone_pre();
}
sub work_on_final {
transfer_remote_branches_to_local_branches(move => 1);
relocate_master();
cleanup();
}
#---
sub copy_commits_from_post_to_pre {
header "Copy commits from post repo to pre";
cd $post_repo;
# Note that this doesn't contain master for some reason
@post_remote_branches = get_remote_branches();
cd $pre_repo;
# Transfer objects from the post repo to the pre repo
# This doesn't work when we say git fetch --tags for some reason?
run qw(git fetch), $post_repo;
# Save post's master (which is, conveniently, saved to FETCH_HEAD)
# because we're going to need it later when we re-point master
run qw(git branch post-master FETCH_HEAD);
# Transfer remote branches from post too
for my $branch (@post_remote_branches) {
run qw(git fetch), $post_repo, "refs/remotes/$branch:refs/remotes/$branch";
}
}
sub transfer_remote_branches_to_local_branches {
my %opts = @_;
# We've got the remote branches copied over, but we still have to get them locally
header(($opts{move} ? "Moving" : "Copying") . " remote branches to local ones");
# Note that this doesn't contain master, because pre's remote branches didn't contain master
@pre_remote_branches = get_remote_branches();
for my $remote (@pre_remote_branches) {
(my $local = $remote) =~ s{origin/}{};
unless ($local eq "master") {
run qw(git branch --no-track), $local, "refs/remotes/$remote";
run qw(git branch -r -D), $remote if $opts{move};
}
}
}
sub graft_pre_and_post {
header "Grafting pre and post repos";
system "cp", $ARGV{grafts_file}, ".git/info/grafts";
run qw(git checkout master);
run qw(git filter-branch --tag-name-filter cat -- --all);
unlink ".git/info/grafts";
}
sub clone_pre {
# Clone pre to remove duplicate commits
# Note that we're cloning to the final location now
header "Cloning pre to remove duplicate commits";
cd "..";
run "rm", "-rf", $final_repo;
run "git", "clone", "file://$pre_repo", $final_repo;
cd $final_repo;
}
sub relocate_master {
# Remember when we saved post's master branch?
# Here's where we save it as the new master.
header "Relocating master";
run qw(git checkout post-master);
run qw(git branch -D master);
run qw(git checkout -b master);
run qw(git branch -D post-master);
}
sub cleanup {
header "Cleaning up";
run qw(git remote rm origin);
run qw(git remote add origin), $ARGV{final_git_url};
run qw(git gc);
run qw(git repack -a -d -f --depth 50 --window 50);
}
#---
process_command_line();
main();
#---
=head1 DESCRIPTION
svn2git_stitch.pl can be used to convert an SVN repository where the
conventional directory structure (trunk, branches, tags) was
introduced not from the beginning but from some point perhaps in the
middle, retaining history both before the split and afterward. It does
this by running the half of the SVN repo before the split and the half
after the split through svn2git separately and then grafting the two
halves together.
=head1 USAGE
First, you'll need my fork of svn2git, SvnToGit.pm, which you should
be able to find alongside this script on Github. Just place it in the
same folder where you downloaded this script (I may put it on CPAN
at some point).
The first time you run this script, it will automatically stop after
the two halves of the SVN repo are converted but right before grafting
them together. This is because you will need will need the Git commit
ids representing the end of the first half and the start of the second
half in order to connect the two. The two repos are saved to
C</tmp/git.pre> and C</tmp/git.post>, and you can inspect them from
the command line or using a tool such as GitX (Mac) or TortoiseGit
(Windows). Once you have the commit ids, put them in a graft file
(read more about them
L<<a href="http://git.wiki.kernel.org/index.php/GraftPoint">here</a>>).
Then, tell the script to use it by re-running it with the --grafts-file
option.
=head2 Options
=head3 --grafts-file FILE
The file path that points to the graft file used to stitch together
the two repos. Copied to C<.git/info/grafts> in the final repository.
=head3 --pre-end REV
=head3 --post-start REV
By default, this script will convert everything before the split in
the SVN repo as well as everything after. You may find that you need
to adjust, however, where you want the left half to stop or where you
want the right half to start. These revision numbers will be passed
straight to C<git svn fetch>.
=head3 --authors AUTHORS_FILE
=head3 --authors-file AUTHORS_FILE
The file that maps SVN committers to Git committers. This will be
passed straight to C<svn2git>.
=head3 --final-git-url GIT_URL
The URL to the Git repository on your server to which you will push
your newly converted repo. This will be added as a remote in the final
step of the conversion so that C<git push> works out of the gate.
=head3 --clear-cache
The first time this script is run, the Git versions of the two halves
of the SVN repo are cached so that you do not have to go through the
process of converting them if you run the script again (since it may
take a long time depending on the size of your original repo). This
option will allow you to regenerate these Git repos should you ever
need to do so.
=head3 --verbose
=head3 -V
Prints commands as they are executed, as well as directories which are
changed.
=head3 --help
=head3 -h
You can probably guess what this does.
=head1 SEE ALSO
=over 4
=item *
L<<a href="http://blog.lostincode.net/archives/2010/01/28/git-svn-stitching">My
writeup about how I wrote this script</a>>
=item *
L<<a href="http://justatheory.com/computers/vcs/git/">David Wheeler's
stitch script upon which this was based</a>>
=item *
L<<a href="http://git.wiki.kernel.org/index.php/GraftPoint">Article
about grafts on the community Git wiki</a>>
=head1 AUTHOR/LICENSE
(c) 2010 Elliot Winkler. Released under the MIT license.
=cut
#!/usr/bin/env perl
#
# A reimplementation of James Coglan's svn2git Ruby script as a Perl
# module and accompanying executable, forked from
# http://github.com/schwern/svn2git by Elliot Winkler.
#
# Changes are as follows:
#
# * Update fix_tags and fix_trunk so they're closer to Ruby script
# * Change default behavior so that the directory for the git repo will
# be auto-created for you. --no-clone assumes you're already in the
# git repo, as usual.
# * Allow user to customize new git repo location
# * Make it object-oriented so we can use it in another script
# * Add --root-is-trunk option
# * Rename --noclone to --no-clone
# * Split off command-line stuff to the command-line script
# * Add --revision option that will be passed to git svn fetch
# * Add default authors file
# * Add {"authors_file" => "..."} as an alias for {"authors" => "..."}
#
package SvnToGit;
# don't know if this should be above package or below
use strict;
use warnings;
use File::Basename;
our $DEFAULT_AUTHORS_FILE = "~/.svn2git/authors";
#---
sub convert {
my($class, %args) = @_;
my $c = $class->new(%args);
$c->run;
return $c;
}
sub new {
my($class, %args) = @_;
$args{git_repo} ||= basename($args{svn_repo});
$args{revision} = $args{revisions} if $args{revisions};
$args{authors} = $args{authors_file} if $args{authors_file};
$args{clone} = 1 unless exists $args{clone};
if (-f $DEFAULT_AUTHORS_FILE && (!$args{authors} || ! -f $args{authors})) {
$args{authors} = $DEFAULT_AUTHORS_FILE;
}
my $self = \%args;
bless($self, $class);
return $self;
}
sub run {
my $self = shift;
$self->ensure_git_present();
if ($self->{clone}) {
$self->clone($self->{svn_repo}, $self->{git_repo});
} else {
print "Since you requested not to clone, I'm assuming that you're already in the git repo.\n";
}
$self->cache_branches();
$self->fix_tags();
$self->fix_branches();
$self->fix_trunk();
$self->optimize_repo();
chdir ".." if $self->{clone};
#print "\n----------\n";
#print "Conversion done!";
#print " Check out $self->{git_repo}." if $self->{clone};
#print "\n";
}
sub clone {
my $self = shift;
$self->ensure_git_svn_present();
if (-e $self->{git_repo}) {
die "Can't clone to '$self->{git_repo}', that directory is already present!\n";
}
mkdir $self->{git_repo};
chdir $self->{git_repo};
print "Cloning SVN repo at $self->{svn_repo} into $self->{git_repo}...\n";
my @clone_opts;
if ($self->{root_is_trunk}) {
push @clone_opts, "--trunk=".$self->{svn_repo};
} else {
for my $opt (qw(trunk branches tags)) {
push @clone_opts, "--$opt=$self->{$opt}" if $self->{$opt};
}
push @clone_opts, "-s" unless @clone_opts;
}
$self->run_command(qw(git svn init --no-metadata), @clone_opts, $self->{svn_repo});
$self->run_command(qw(git config svn.authorsfile), $self->{authors}) if $self->{authors};
my @fetch_opts;
push @fetch_opts, "-r", $self->{revision} if $self->{revision};
$self->run_command(qw(git svn fetch), @fetch_opts);
}
sub cache_branches {
my $self = shift;
$self->{remote_branches} = [map { strip($_) } `git branch -r`];
}
sub fix_tags {
my $self = shift;
print "Turning svn tags cloned as branches into real git tags...\n";
my $tags_path = $self->{tags} || 'tags/';
$tags_path .= '/' unless $tags_path =~ m{/$};
my @tag_branches = grep m{^\Q$tags_path\E}, @{$self->{remote_branches}};
for my $tag_branch (@tag_branches) {
qx/git show-ref $tag_branch/;
if ($?) {
warn "'$tag_branch' is not a valid branch reference, so skipping..";
next;
}
my($tag) = $tag_branch =~ m{^\Q$tags_path\E(.*)};
warn "Couldn't find tag name from $tag_branch" unless length $tag;
if (my $strip = $self->{strip_tag_prefix}) {
$tag =~ s{^$strip}{};
}
my $subject = strip(`git log -l --pretty=format:'\%s' "$tag_branch"`);
my $date = strip(`git log -l --pretty=format:'\%ci' "$tag_branch"`);
$self->run_command("git", "checkout", $tag_branch);
$self->run_command("GIT_COMMITTER_DATE='$date'", qw(git tag -a -m), $subject, $tag, $tag_branch);
$self->run_command(qw(git branch -d -r), $tag_branch);
}
}
sub fix_branches {
my $self = shift;
print "Checking out remote branches as local branches...\n";
my $tags_path = $self->{tags} || 'tags/';
$tags_path .= '/' unless $tags_path =~ m{/$};
my @remote_branches = grep !m{^\Q$tags_path}, @{$self->{remote_branches}};
my $trunk = $self->{trunk} || "trunk";
for my $branch (@remote_branches) {
next if $branch eq $trunk;
$self->run_command("git", "checkout", $branch);
$self->run_command("git", "checkout", "-b", $branch);
}
}
sub fix_trunk {
my $self = shift;
my $trunk = $self->{trunk} || "trunk";
return unless grep /^\s*\Q$trunk\E\s*/, @{$self->{remote_branches}};
print "Making sure master is trunk...\n";
$self->run_command("git", "checkout", $trunk);
$self->run_command(qw(git branch -D master));
$self->run_command(qw(git checkout -f -b master));
$self->run_command(qw(git branch -d -r), $trunk);
}
sub optimize_repo {
my $self = shift;
$self->run_command(qw(git gc));
}
#---
sub ensure_git_present {
my $self = shift;
`git --version`;
die "git --version didn't work. Is git installed?\n" if $?;
}
sub ensure_git_svn_present {
my $self = shift;
`git help svn`;
die "git help svn didn't work. Is git-svn installed?\n" if $?;
}
sub run_command {
my $self = shift;
print "COMMAND: @_\n" if $self->{verbose};
system @_;
my $exit = $? >> 8;
die "@_ exited with $exit" if $exit;
return 1;
}
# don't need to get self here, since this is kind of a private method
sub strip {
local $_ = shift;
s/^\s+//; s/\s+$//;
return $_;
}
1;
@guychisholm
Copy link

Fantastically useful scripts, but I think line 213 of svn2git_stitch.pl needs some clarifying. For the graft file you're instructed to use:
<start of pre-repo id> <end of post-repo id>
I think should be:
<start of post-repo id> <end of pre-repo id>
i.e. <commit sha1> <parent sha1>
start and end might also be a little ambiguous, but I can't think of a better way to put it myself.

edit: perhaps <initial post-repo commit id> <last pre-repo commit id> ?

@mcmire
Copy link
Author

mcmire commented Feb 6, 2012

Makes sense. I've fixed this in https://github.com/mcmire/SvnToGit, where I moved these scripts to.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment