cvsup server operation
Barry Bouwsma
freebsd-misuser at remove-NOSPAM-to-reply.NOSPAM.dyndns.dk
Sun Nov 9 02:35:13 PST 2003
[Oooh, this is an old one, but I had to chime in. Don't reply, or at least
drop the hostname part of the above valid IPv6-only address above to obtain
some theoretically IPv4-capable e-mail which will probably bounce anyway]
Boy howdy! Youse'all said:
> But I check the log files periodically. Any time I notice somebody
> abusing a mirror (e.g., with cronjob updates more frequently than once
> an hour) I simply blacklist them in the cvsupd.access file. I feel no
> remorse at all about denying access to greedy jerks. Likewise, when
Ooooh, yeah! My sort of sysadminning! I have a new hero.
> I catch people doing simultaneous updates from multiple machines at
> their site, I add a rule to cvsupd.access that limits them to 1 update
> at a time from their subnet. I always have a great big smile on my
> face when I do that. No guilt whatsoever. :-)
Ah, mmmm. A sadist after my own heart. Wonderful.
Now, not to rain on your parade or anything, but let me turn the
tables and allow me to play the victim. Just help me into those
handcuffs (a little tighter please, oh yesss), let's negotiate a
safeword (or not), and put that ball gag into my mouth and I'll now
attempfmft tmm mmf gmfmmm rrmmmmfmffgg...
As you (of all people) are well aware, there are several different
behavioural patterns that a CVSup session may follow, with different
load patterns (cpu, bandwidth). Not only do I play a masochist on TV
but I also suffer a dial-in connection a hundred times slower than
your average broadband abuser, a CPU 50 times slower than your typical
gamer, and disk bandwidth that, well, matches my online bandwidth, so
I really and truly am a masochist (oh portblock me harder, oooh).
At one extreme is the checkout mode of CVsup, where one is initially
populating a repository, or appending to the mail archives, and such.
This saturates the download bandwidth, but generally leaves the CPU
idle along with the upstream path.
At the other extreme is updating an up-to-date repository, with few
changes, where the upstream bandwidth generally is pegged with the
list file contents, with a comparable amount of data being returned,
but little for the CPU to do. The clever user will keep the disk
idle with the `-s' option.
A third extreme is tagging of a repository, where practically every
file is touched. The upstream bandwidth from the list file roughly
matches the downstream bandwidth with the tags needed, but here these
tags need to be merged into pretty much every file, checksums generated,
and compared for errors. This is heavily CPU-intensive (at least on my
double-digit MHz machine where I mirror the repositories), and keeps the
disk doing something, but if run alone, one sees vast quantities of
idle bandwidth in both directions. At least on my machine. Probably
not for a Normal human, though.
But your normal CVSup session will be a mixture of dumping the list
file upstream, getting some new files in checkout, adding deltas and
calculating checksums, and generally keeping busy (more busy the longer
one goes without updating), yet still with a scattering of all the
above described idle scenarios.
So in order to maximize the amount of data in both directions, and thus
minimize online time, it helps greatly to combine a download-bandwidth-
heavy cvsup session (say, updating the mail archives, or checking out
DragonFly like I've been doing for a few weeks off and on), with an
upload/CPU-intensive update (pretty much anything else that I've done
somewhat recently, or when a new tag appears).
That means I almost always have two if not three cvsup processes at
the same time running -- a virginal checkout, plus something upstream-
bandwidth intensive -- usually a freshly-tagged update, or www rsync,
or gnats update, in order to keep the pipes in both directions as full
as possible and my CPU in a sweat. I can spend five hours getting tags
added to a repository, or six hours to get the same tags and complete
updates of a handful of other repositories that alone would require a
few hours, so I do that. For reasons of topology, I'd prefer to do
all the updates all from the nearest site, but due to access restrictions,
I invariably need to split the updates among three sites. (I see your
sneer. Hrmph) Also, since I try to mirror all the repositories I
can get my manacled hands on, I must naturally use different sites,
so the per-site one-connection limit seldom affects me except when I
forget I can't update mail-archive and www for FreeBSD from the same
site at the same time.
Watching `netstat -w 1' shows me how much more effective this is too,
unless all of the sessions end up checking out heaps of new files.
But with my infrequent online activity, I'm usually downloading fresh
tags somewhere (4.9-RELEASE heading my way Real Soon Now[tm], honest)
as well as a brand new repository.
This isn't quite the case where I'm using one of them thar newfangled
download-accelerators to open 50 FTP connections for one file to get
so much download bandwidth it all over you screen, but instead trying
to make more effective use of both directions, pipelining, and so on.
Your logs will also give a clue as to the relative activity, comparing
the ratio of data in to data out to hint to the type of update, along
with the time required, that really doesn't distinguish between tagging
over a 2400 baud dial-in vs. tagging a dog-slow machine over broadband
with idle bandwidth up the wazoo.
Of course, no MODERN machine is going to see idle bandwidth when hit
with new tags on every file, so I'm not going to claim that anyone
else will see any benefit from multiple sessions, but now you know
quite well who it is you're tightening the thumbscrews upon with each
tweak to your access files. Happy yet?
Thank you sir, may I have another.
Barry Bouwsma
whip me, beat me, make me buildworld without -DNOCLEAN
(of course, I'm not leeching off any servers you admin, but I just
wanted to point out there could be reasons other than greed to be
running several cvsup sessions at once, at least that I attempt to
justify...)
More information about the freebsd-hubs
mailing list