Multi-machine mirroring choices
Matthew Dillon
dillon at apollo.backplane.com
Tue Jul 15 17:52:44 UTC 2008
:Oliver Fromme wrote:
:
:> Yet another way would be to use DragoFly's "Hammer" file
:> system which is part of DragonFly BSD 2.0 which will be
:> released in a few days. It supports remote mirroring,
:> i.e. mirror source and mirror target can run on different
:> machines. Of course it is still very new and experimental
:> (however, ZFS is marked experimental, too), so you probably
:> don't want to use it on critical production machines.
:
:Let's not get carried away here :)
:
:Kris
Heh. I think its safe to say that a *NATIVE* uninterrupted and fully
cache coherent fail-over feature is not something any of us in BSDland
have yet. It's a damn difficult problem that is frankly best solved
above the filesytem layer, but with filesystem support for bulk mirroring
operations.
HAMMER's native mirroring was the last major feature to go into
it before the upcoming release, so it will definitely be more
experimental then the rest of HAMMER. This is mainly because it
implements a full blown queue-less incremental snapshot and mirroring
algorithm, single-master-to-multi-slave. It does it at a very low level,
by optimally scanning HAMMER's B-Tree. In other words, the kitchen
sink.
The B-Tree propagates the highest transaction id up to the root to
support incremental mirroring and that's the bit that is highly
experimental and not well tested yet. It's fairly complex because
even destroyed B-Tree records and collapses must propagate a
transaction id up the tree (so the mirroring code knows what it needs
to send to the other end to do comparative deletions on the target).
(transaction ids are bundled together in larger flushes so the actual
B-Tree overhead is minimal).
The rest of HAMMER is shaping up very well for the release. It's
phenominal when it comes to storing backups. Post-release I'll be
moving more of our production systems to HAMMER. The only sticky
issue we have is filesystem-full handling, but it is more a matter
of fine-tuning then anything else.
--
Someone mentioned atime and mtime. For something like ZFS or HAMMER,
these fields represent a real problem (atime more then mtime). I'm
kinda interested in knowing, does ZFS do block replacement for
atime updates?
For HAMMER I don't roll new B-Tree records for atime or mtime updates.
I update the fields in-place in the current version of the inode and
all snapshot accesses will lock them (in getattr) to ctime in order to
guarantee a consistent result. That way (tar | md5) can be used to
validate snapshot integrity.
At the moment, in this first release, the mirroring code does not
propagate atime or mtime. I plan to do it, though. Even though
I don't roll new B-Tree records for atime/mtime updates I can still
propagate a new transaction id up the B-Tree to make the changes
visible to the mirroring code. I'll definitely be doing that for mtime
and will have the option to do it for atime as well. But atime still
represents a big expense in actual mirroring bandwidth. If someone
reads a million files on the master then a million inode records (sans
file contents) would end up in the mirroring stream just for the atime
update. Ick.
-Matt
More information about the freebsd-stable
mailing list