Subversion/CVS experiment summary
Craig Boston
craig at tobuj.gank.org
Mon Feb 9 09:30:10 PST 2004
This is a bit of a long email, so please skip unless you're into source code
revision management :)
This is an informal report on the viability of using Subversion to manage the
FreeBSD source code repository. Some of this is generic and will be familiar
to anyone who has looked at SVN before, some is more FreeBSD-specific.
NOTE: I'm not trying to push one SCM over the other or suggest that CVS is
wholly inadequate. This is merely the result of an evaluation for my personal
use, and I thought I'd post it in case anyone was interested. CVS has been
used by the FreeBSD project for a LONG time for good reasons. Despite its
shortcomings, I suspect that it will be in use for quite a while longer.
-----------------------------------------------------------------------------
Section the 1st - Motive
-----------------------------------------------------------------------------
My main motivation for these tests was to bring my local modifications to
FreeBSD into some semblance of order. It seems I've amassed a bit of a
collection of local patches, 3rd party patches, and side projects -- some of
which are mutually exclusive or apply to different branches. Simply keeping
a working copy with my changes in it works fine for one project but becomes
painful when there are several. I'd also like to be able to keep version
history for my modifications.
I've heard good things about Perforce, and its effortless merge functionality
looks really slick. If I'm ever involved with a major commercial coding
project, I'll definitely give it some consideration. For my "free-time"
projects however it's not really an option. A couple of my local mods are in
a bit of a grey area as far as the 'non-commercial' license goes, so I'd
rather avoid that whole issue.
-----------------------------------------------------------------------------
Section the 2nd - Setup and conversion
-----------------------------------------------------------------------------
Most of my tests were performed on the src/sys portion of the repository. It
seemed to be large enough that I could get a general idea of how well
Subversion scales, but small enough that I wouldn't spend all week waiting
for the import to complete. All tests were done on a Pentium 4 2.8 GHz
system with 512MB RAM. I used a local repository on one disk and the working
directories on another (for both CVS and SVN). These tests have been done
over the course of the last week and a half, using subversion-0.35.1_1.
I've heard of attempts to convert the repo for testing using the cvs2svn.py
failing (for more details, see the thread at
http://docs.freebsd.org/cgi/getmsg.cgi?fetch=640133+0+archive/2004/freebsd-hackers/20040111.freebsd-hackers).
These problems seem to be fixed in the most recent version of the script -- I
have been able to successfully import sys, bin, sbin, and lib so far. The
next target for testing is contrib as it seems to be the most likely
candidate for problems with all those vendor branches.
Comments on importing: It's SLOOOOOOOOOOW. It took 43.9 hours just for
src/sys, and this is a relatively speedy system! It starts out at a pretty
good pace, but the more commits it processes, the slower each one seems to
take.
For my purposes I would also need some method of incrementally updating the
repository with any new commits made to CVS. This doesn't exist yet, but I'm
thinking about trying to hack cvs2svn to do this. Kind of an inverse vendor
branch I guess.
-----------------------------------------------------------------------------
Section the 3rd - Head to Head
-----------------------------------------------------------------------------
Yeah, I know comparing Subversion and CVS isn't a fair test -- SVN is
designed to be much more than CVS. But it's a comparison that will be
inevitably made, so might as well get it out of the way.
Bad points (for SVN):
* Repo size: The src/sys part of the tree alone is 1.2GB. The same portion
of the repo in CVS is only 313MB. I had to keep a script running to
routeinly purge unused database logs to avoid running out of disk space
during the import.
* Working set size: SVN keeps a complete copy of every file that is checked
out in a hidden directory analogous to "CVS" directories. This does have
some advantages outlined below, but effectively doubles the size of your
working directory.
* Speed: 0.35 is considerably slower than CVS for some operations. svn
checkout is on average about 6 times slower than cvs checkout.
Interestingly, CVS seems to benefit from the buffer cache much more than
SVN does -- nearly a 50% decrease in execution time for CVS once the cache
was populated. Please note however that checking the same thing out over
and over isn't a very useful thing to do, and SVN fares better with the
more common operations.
* Not as thouroughly tested with large repositories. One advantage CVS has
is that it is old, widely used, and has been used successfully (more or
less) by large installations. SVN simply hasn't had anywhere close to the
number of lines of code pushed through it that CVS has. This means it's
more likely that SVN has undiscovered bugs, edge cases, etc.
* "Requires" Apache for the network server. There is a simpler CVS-like
network protocol, but it suffers from the same problems with access
control and locking and the like that CVS does. In order to overcome
those limitations, you pretty much have to use Apache/WebDAV. Some may
argue that this isn't really a negative, but it certainly doesn't go with
the K.I.S.S. philosophy.
* No cvsup equivalent yet. You can fairly easily use WebDAV to pull a copy
of the trunk or a particular branch, but it's not nearly as efficient as
the rsync algorithm. There's also no way to use WebDAV to grab a certain
date or revision like you can with cvsup -- you have to have the svn
client installed. In order to be even a contender to replace CVS, it
still needs a *FAST* and *SIMPLE* way to synchronize source with an
arbitrary tag or revision.
* Still no solution for the repeated merge problem. This is supposed to be
addressed post-1.0; no official timeframe on it AFAIK.
* I don't think they have added arbitrary keyword support yet. We would
probably need a local hack to support $FreeBSD$
Good points:
* Atomic commits across multiple files
* Near-O(1) branching/tagging, and no branch-point-tag mess
* The cvs2svn script is fairly smart and tries to group commits together
that should be part of a single commit. I believe it looks at timestamps
and commit messages to figure this out.
* Move and copy commands that DTRT -- no need for repo copies.
* As a result of not needing repo copies, it preserves the history of the
trunk. Currently we have no easy way to see what, for example,
2.2-CURRENT looked like on a particular day. Somehow I doubt that
sys/amd64/amd64/tsc.c really existed in 1996. SVN wouldn't magically fix
existing problems without outside help, but it would be able to keep it
from getting any worse.
* Subversion is supposed to have a more efficient network layer than CVS. I
haven't had a chance to do any real empirical testing on this yet.
* svn update is much faster than cvs update. With no changes to the
repository, it completes in 1-2 seconds flat. With only a few changes, it
takes a few seconds longer but it still quite a bit quicker than CVS. CVS
seems to have a much flatter graph with relation to the number of changes
being updated -- it takes a while even if nothing changed.
* Subversion is better at disconnected operation. Because it keeps a copy
of the last checked out revision, you can see what files have changed
locally, revert changes on a particular file, create directories,
move/rename files, and even generate diffs without having a connection to
a remote repository. All of these commands are also much quicker than
their CVS equivalents because they are working on a local copy.
* Native binary support. SVN treats all files as binary unless you specify
otherwise and can efficiently store differences between binary files (CVS
has to store the complete file in every revision). This might make things
like the compat libs a little easier to manage.
-----------------------------------------------------------------------------
Section the 4th - Conclusions
-----------------------------------------------------------------------------
Honestly, I don't think Subversion is quite ready yet. However, it is getting
_very_ close to being a viable alternative to CVS, for the needs of the
FreeBSD project as far as I know them. I'll definitely be trying it out for
some of my local projects that are currently stored in CVS.
FWIW, my intention is not to start a bikeshed discussion (but if we're doing
that my vote is on plaid!) For the most part, CVS does a reasonably good job
of keeping the FreeBSD source code in line. However, it does have some
weaknesses that make it unsuitable for heavy development -- witness the
multitude of projects happening in local Perforce trees. Subversion was
brought up before, recently even, but there were still several major
showstoppers. A couple of those have been resolved in the last month.
Random notes:
I know there are other SCMs out there, and will probably take a look at them
when I get a chance. I picked Subversion for this test because it's supposed
to be the successor of CVS, so it's a logical place to start.
It also looks as if Subversion 0.37 (aka 1.0-RC) has just been released. I'll
have to take a look at it and see if any of the problems noted above have
been resolved.
Any comments / corrections / arguments are welcome :)
Craig
--
"A 'No Parking' sign at a certain location means..."
- multiple choice question on NY State learner's permit test
More information about the freebsd-hackers
mailing list