package distribution crisis - CDN needed
João Carlos Mendes Luís
jonny at jonny.eng.br
Tue Apr 8 00:14:11 UTC 2008
Pav Lucistnik wrote:
> Okay the situation recently was that the mirrors had no chance keeping
> up with all the package sets I've been uploading to ftp-master.
>
> We clearly need to move beyond rsync/cvsup synced ftp mirrors. This does
> not scale.
>
> I do propose a creation of a CDN (Content Delivery Network), having
> these features:
>
> - no mirroring of a complete package set! (Also no directory listings.)
> When client requests the file, and the file is not in the local cache,
> the file is downloaded from the upstream server and while it's being
> obtained, it's already being sent to the client. This is basically
> squid.
>
> - if the file is present in the local cache, it's returned from local
> cache.
>
> - local cache is invalidated when a new package set is available on
> an upstream server. Invalidating mechanism:
> option a) cronjob that polls upstream server every 5 minutes for a
> file that gives current package set IDs (pull)
> option b) master server sends notification to all mirrors to
> invalidate a package set (push)
> optimization: when package set was invalidated, don't delete old
> files, instead on next hit, verify timestamp against upstream server
>
> - atomic package set uploads to master from pointyhat (probably having
> two directories that are switched over on master)
>
> - everything runs over http
>
> - default source of files for "pkg_add -r" command
>
> The goal is to refresh a package set on a daily basis.
>
>
> I don't know if we can use some existing software for this (Squid?
> Apache mod_proxy?) or if we will need to put something new together.
> Ideas?
>
I am not sure if this would solve anything, but if we go further in this
direction, I'd like to see some architecture with prefetch capability.
Note also that a real CDN would hide from the final user the real data
location, and this would be selected by some sort of proximity and/or
load information. Some CDNs indeed use proxy cache to central server as
means of populating its own data, but proxy caching is only a small part
of the solution.
I did not follow whatever situation happened recently, but I had some
trouble in the past with late announcements for mirror administrators.
I had sometimes received the announce just like any other FreeBSD user.
And even in that cases, packages were distributed much time earlier than
final release.
Jonny
--
João Carlos Mendes Luís - Networking Engineer - jonny at jonny.eng.br
More information about the freebsd-hubs
mailing list