14/09/2011

Git-svn and the very very large svn repo

At my current workplace, the svn repository is shared by all the company's projects, that's a lot of projects and quite a lot of commits not to mention branches and stuff, also my project has been going on for a few years and the developpers have tried a wild variety of things on the branch and tags side.

As we have a very complex merge coming up, I decided to give git-svn a quick try to see if it could help us with that, the initial import on windows never finished. It failed at reading a particularly large commit losing the data connection in the middle of the commit. After we were allowed to setup a linux workstation (faster builds which don't fail for hitting the maximum path length on some files), I gave it a second try.

I got myself a working git repo, made mostly unusable by the dozens of branches and the hundreds of tags in it. That's when I started looking into selectively fetching part of the branches and the tags. The thing is there are dozens of articles on the basic use of git-svn but much fewer on more advanced configurations.

It's quite simple and actually it is documented on the main git-svn man page. the first step is to initialize your repo but not clone it:
git svn init --username my_user -s http://my-svn-url/
In the case of our corporate svn we have authentification going on thus the username. I used the s flag to create the default mappings for branches and tags which we will now edit by opening the newly created config file in the .git subdirectory. The file should look like this :
[core]
    repositoryformatversion = 0
    filemode = true
    bare = false
    logallrefupdates = true
[svn-remote "svn"]
    url = http://my-svn-url
    fetch = projets/myproject/trunk:refs/remotes/trunk
    branches = projets/myproject/branches/*:refs/remotes/*
    tags = projets/myproject/tags/*:refs/remotes/tags/*
The first thing I did was to remove the tags line as I didn't want the tags to be fetched they are in such disarray at the moment as to be useless. The second thing was to transform the "fetch all branches" into a "fetch only these branches" directive.

This is accomplished by using range expressions instead of the glob * on the left side of the branches directive. A range is expressed like this {branch1, branch2,...} and you have to specify all the branch names as far as I can tell. On the right side of the branches directive you must leave the * at the end it is mandatory. A correct branch directive thus looks like :
branches = projets/myproject/branches/{branch1,branch3}:refs/remotes/*
Once you have selected the branches you want to fetch and configured your git repo, all that is left is to run
git svn fetch
and wait :)

1 commentaire:

Igosuki a dit…

Très bien :) C'est vrai qu'il faut regarder jusqu'en bas du man pour avoir l'info ;)