Multi-machine mirroring choices

Tue Jul 15 17:52:44 UTC 2008

:Oliver Fromme wrote:
:
:> Yet another way would be to use DragoFly's "Hammer" file
:> system which is part of DragonFly BSD 2.0 which will be
:> released in a few days.  It supports remote mirroring,
:> i.e. mirror source and mirror target can run on different
:> machines.  Of course it is still very new and experimental
:> (however, ZFS is marked experimental, too), so you probably
:> don't want to use it on critical production machines.
:
:Let's not get carried away here :)
:
:Kris

    Heh.  I think its safe to say that a *NATIVE* uninterrupted and fully
    cache coherent fail-over feature is not something any of us in BSDland
    have yet.  It's a damn difficult problem that is frankly best solved
    above the filesytem layer, but with filesystem support for bulk mirroring
    operations.

    HAMMER's native mirroring was the last major feature to go into
    it before the upcoming release, so it will definitely be more
    experimental then the rest of HAMMER.  This is mainly because it
    implements a full blown queue-less incremental snapshot and mirroring
    algorithm, single-master-to-multi-slave.  It does it at a very low level,
    by optimally scanning HAMMER's B-Tree.  In other words, the kitchen
    sink.

    The B-Tree propagates the highest transaction id up to the root to
    support incremental mirroring and that's the bit that is highly
    experimental and not well tested yet.  It's fairly complex because
    even destroyed B-Tree records and collapses must propagate a
    transaction id up the tree (so the mirroring code knows what it needs
    to send to the other end to do comparative deletions on the target).

    (transaction ids are bundled together in larger flushes so the actual
    B-Tree overhead is minimal).

    The rest of HAMMER is shaping up very well for the release.  It's
    phenominal when it comes to storing backups.  Post-release I'll be
    moving more of our production systems to HAMMER.  The only sticky
    issue we have is filesystem-full handling, but it is more a matter
    of fine-tuning then anything else.

    --

    Someone mentioned atime and mtime.  For something like ZFS or HAMMER,
    these fields represent a real problem (atime more then mtime).  I'm
    kinda interested in knowing, does ZFS do block replacement for
    atime updates?

    For HAMMER I don't roll new B-Tree records for atime or mtime updates.
    I update the fields in-place in the current version of the inode and
    all snapshot accesses will lock them (in getattr) to ctime in order to
    guarantee a consistent result.  That way (tar | md5) can be used to
    validate snapshot integrity.

    At the moment, in this first release, the mirroring code does not
    propagate atime or mtime.  I plan to do it, though.  Even though
    I don't roll new B-Tree records for atime/mtime updates I can still
    propagate a new transaction id up the B-Tree to make the changes
    visible to the mirroring code.  I'll definitely be doing that for mtime
    and will have the option to do it for atime as well.  But atime still
    represents a big expense in actual mirroring bandwidth.  If someone
    reads a million files on the master then a million inode records (sans
    file contents) would end up in the mirroring stream just for the atime
    update.  Ick.

						-Matt