Subversion/CVS experiment summary

Mon Feb 9 09:30:10 PST 2004

This is a bit of a long email, so please skip unless you're into source code 
revision management :)

This is an informal report on the viability of using Subversion to manage the 
FreeBSD source code repository.  Some of this is generic and will be familiar 
to anyone who has looked at SVN before, some is more FreeBSD-specific.

NOTE: I'm not trying to push one SCM over the other or suggest that CVS is 
wholly inadequate.  This is merely the result of an evaluation for my personal 
use, and I thought I'd post it in case anyone was interested.  CVS has been 
used by the FreeBSD project for a LONG time for good reasons.  Despite its 
shortcomings, I suspect that it will be in use for quite a while longer.

-----------------------------------------------------------------------------
Section the 1st - Motive
-----------------------------------------------------------------------------
My main motivation for these tests was to bring my local modifications to 
FreeBSD into some semblance of order.  It seems I've amassed a bit of a 
collection of local patches, 3rd party patches, and side projects -- some of 
which are mutually exclusive or apply to different branches.  Simply keeping 
a working copy with my changes in it works fine for one project but becomes 
painful when there are several.  I'd also like to be able to keep version 
history for my modifications.

I've heard good things about Perforce, and its effortless merge functionality 
looks really slick.  If I'm ever involved with a major commercial coding 
project, I'll definitely give it some consideration.  For my "free-time" 
projects however it's not really an option.  A couple of my local mods are in 
a bit of a grey area as far as the 'non-commercial' license goes, so I'd 
rather avoid that whole issue.

-----------------------------------------------------------------------------
Section the 2nd - Setup and conversion
-----------------------------------------------------------------------------
Most of my tests were performed on the src/sys portion of the repository.  It 
seemed to be large enough that I could get a general idea of how well 
Subversion scales, but small enough that I wouldn't spend all week waiting 
for the import to complete.  All tests were done on a Pentium 4 2.8 GHz 
system with 512MB RAM.  I used a local repository on one disk and the working 
directories on another (for both CVS and SVN).  These tests have been done 
over the course of the last week and a half, using subversion-0.35.1_1.

I've heard of attempts to convert the repo for testing using the cvs2svn.py 
failing (for more details, see the thread at 
http://docs.freebsd.org/cgi/getmsg.cgi?fetch=640133+0+archive/2004/freebsd-hackers/20040111.freebsd-hackers).
These problems seem to be fixed in the most recent version of the script -- I 
have been able to successfully import sys, bin, sbin, and lib so far.  The 
next target for testing is contrib as it seems to be the most likely 
candidate for problems with all those vendor branches.

Comments on importing: It's SLOOOOOOOOOOW.  It took 43.9 hours just for 
src/sys, and this is a relatively speedy system!  It starts out at a pretty 
good pace, but the more commits it processes, the slower each one seems to 
take.

For my purposes I would also need some method of incrementally updating the 
repository with any new commits made to CVS.  This doesn't exist yet, but I'm 
thinking about trying to hack cvs2svn to do this.  Kind of an inverse vendor 
branch I guess.

-----------------------------------------------------------------------------
Section the 3rd - Head to Head
-----------------------------------------------------------------------------
Yeah, I know comparing Subversion and CVS isn't a fair test -- SVN is
designed to be much more than CVS.  But it's a comparison that will be 
inevitably made, so might as well get it out of the way.

Bad points (for SVN):
  * Repo size: The src/sys part of the tree alone is 1.2GB.  The same portion
    of the repo in CVS is only 313MB.  I had to keep a script running to
    routeinly purge unused database logs to avoid running out of disk space
    during the import.
  * Working set size: SVN keeps a complete copy of every file that is checked
    out in a hidden directory analogous to "CVS" directories.  This does have
    some advantages outlined below, but effectively doubles the size of your
    working directory.
  * Speed: 0.35 is considerably slower than CVS for some operations.  svn
    checkout is on average about 6 times slower than cvs checkout.
    Interestingly, CVS seems to benefit from the buffer cache much more than
    SVN does -- nearly a 50% decrease in execution time for CVS once the cache
    was populated.  Please note however that checking the same thing out over
    and over isn't a very useful thing to do, and SVN fares better with the
    more common operations.
  * Not as thouroughly tested with large repositories.  One advantage CVS has
    is that it is old, widely used, and has been used successfully (more or 
    less) by large installations.  SVN simply hasn't had anywhere close to the
    number of lines of code pushed through it that CVS has.  This means it's
    more likely that SVN has undiscovered bugs, edge cases, etc.
  * "Requires" Apache for the network server.  There is a simpler CVS-like
    network protocol, but it suffers from the same problems with access 
    control and locking and the like that CVS does.  In order to overcome
    those  limitations, you pretty much have to use Apache/WebDAV.  Some may
    argue that this isn't really a negative, but it certainly doesn't go with
    the K.I.S.S. philosophy.
  * No cvsup equivalent yet.  You can fairly easily use WebDAV to pull a copy
    of the trunk or a particular branch, but it's not nearly as efficient as
    the rsync algorithm.  There's also no way to use WebDAV to grab a certain
    date or revision like you can with cvsup -- you have to have the svn
    client installed.  In order to be even a contender to replace CVS, it
    still needs a *FAST* and *SIMPLE* way to synchronize source with an
    arbitrary tag or revision.
  * Still no solution for the repeated merge problem.  This is supposed to be
    addressed post-1.0; no official timeframe on it AFAIK.
  * I don't think they have added arbitrary keyword support yet.  We would
    probably need a local hack to support $FreeBSD$

Good points:
  * Atomic commits across multiple files
  * Near-O(1) branching/tagging, and no branch-point-tag mess
  * The cvs2svn script is fairly smart and tries to group commits together
    that should be part of a single commit.  I believe it looks at timestamps
    and commit messages to figure this out.
  * Move and copy commands that DTRT -- no need for repo copies.
  * As a result of not needing repo copies, it preserves the history of the 
    trunk.  Currently we have no easy way to see what, for example,
    2.2-CURRENT looked like on a particular day.  Somehow I doubt that
    sys/amd64/amd64/tsc.c really existed in 1996.  SVN wouldn't magically fix
    existing problems without outside help, but it would be able to keep it
    from getting any worse.
  * Subversion is supposed to have a more efficient network layer than CVS.  I
    haven't had a chance to do any real empirical testing on this yet.
  * svn update is much faster than cvs update.  With no changes to the
    repository, it completes in 1-2 seconds flat.  With only a few changes, it
    takes a few seconds longer but it still quite a bit quicker than CVS.  CVS
    seems to have a much flatter graph with relation to the number of changes
    being updated -- it takes a while even if nothing changed.
  * Subversion is better at disconnected operation.  Because it keeps a copy
    of the last checked out revision, you can see what files have changed
    locally, revert changes on a particular file, create directories,
    move/rename files, and even generate diffs without having a connection to
    a remote repository.  All of these commands are also much quicker than
    their CVS equivalents because they are working on a local copy.
  * Native binary support.  SVN treats all files as binary unless you specify
    otherwise and can efficiently store differences between binary files (CVS
    has to store the complete file in every revision).  This might make things
    like the compat libs a little easier to manage.

-----------------------------------------------------------------------------
Section the 4th - Conclusions
-----------------------------------------------------------------------------
Honestly, I don't think Subversion is quite ready yet.  However, it is getting 
_very_ close to being a viable alternative to CVS, for the needs of the 
FreeBSD project as far as I know them.  I'll definitely be trying it out for 
some of my local projects that are currently stored in CVS.

FWIW, my intention is not to start a bikeshed discussion (but if we're doing 
that my vote is on plaid!)  For the most part, CVS does a reasonably good job 
of keeping the FreeBSD source code in line.  However, it does have some 
weaknesses that make it unsuitable for heavy development -- witness the 
multitude of projects happening in local Perforce trees.  Subversion was 
brought up before, recently even, but there were still several major 
showstoppers.  A couple of those have been resolved in the last month.

Random notes:
I know there are other SCMs out there, and will probably take a look at them 
when I get a chance.  I picked Subversion for this test because it's supposed 
to be the successor of CVS, so it's a logical place to start.

It also looks as if Subversion 0.37 (aka 1.0-RC) has just been released.  I'll 
have to take a look at it and see if any of the problems noted above have 
been resolved.

Any comments / corrections / arguments are welcome :)

Craig

--
"A 'No Parking' sign at a certain location means..."
- multiple choice question on NY State learner's permit test