bhyve: zvols for guest disk - yes or no?
Jan Bramkamp
crest at rlwinm.de
Fri Nov 18 10:49:08 UTC 2016
On 18/11/2016 09:47, Miroslav Lachman wrote:
> Jan Bramkamp wrote on 2016/11/17 11:16:
>> On 16/11/2016 19:10, Patrick M. Hausen wrote:
>>>> Without ZFS you would require a reliable hardware RAID controller (if
>>>> such a magical creature exists) instead (or build a software RAID1+0
>>>> from gmirror and gstripe). IMO money is better invested into more RAM
>>>> keeping ZFS and the admin happy.
>>>
>>> And we always use geom_mirror with UFS ...
>>
>> That would work but I don't recommend for new setups. ZFS offers you a
>> lot of operation which in my opinion alone is worth the overhead.
>> Without ZFS you would have to use either large raw image files in UFS or
>> fight with an old fashioned volume manager.
>
> One thing to note - ZFS isn't holy grail and has own problems too.
Of course ZFS isn't perfect. Nothing as complex as ZFS could be.
> For example there is not fsck_zfs and there are some cases where you can end
> up with broken pool and because of its complexity the only thing you can
> do is to restore from backup.
Because ZFS takes a different approach to data and meta data integrity.
By design ZFS should be able to automatically recover without dataloss
from all cases a fsck_zfs could handle without user interaction. This is
possible because ZFS is a Merkle-DAG (edges are stored inside nodes and
contain the checksums of the referenced nodes) and stores multiple
copies of important metadata (in addition to mirroring and RAID-Z).
Fsck on UFS includes a good amount of guesswork which usually works
because UFS the on-disk data structures are a lot simpler. That way you
end up with some state the kernel can mount without panic()ing, but it
doesn't imply that its always exactly the state the users and
applications expected the system to be in.
> This can occured on ZFS with higher
> probability than on simple UFS2.
Only if you pick your metrics with a strong bias in favor of UFS. The
ZFS data structures are more complicated and you can't repair a ZFS pool
with a hex editor and a pocket calculator. At the same time ZFS protects
it data (including meta data) a lot better from corruption.
* ZFS uses a copy on write B-tree with path copying instead of
modifying live data in place.
* Because the ZFS graph is directed and cycle-free its edges can (and
do) contain the checksum of the pointed to node.
* By default ZFS stores three copies of vital pool level meta data and
two copies of dataset level meta data in addition to VDEV level redundancy.
* The UFS and ZFS code has been battle tested in production enough years.
UFS is not suitable for today's large file systems. It trusts its
backing storage to much. UFS can't protect from your data from
undetected read errors because it doesn't store any checksums along the
data. It can't help you detect phantom writes because there a are
checksums in the edges. You could swap two blocks of file content with
each other and UFS wouldn't notice.
The ratio of disk capacity to throughput has reached a point where it is
no longer acceptable to run fsck at boot. UFS2 on FreeBSD offers
soft-updates and snapshots which allow fsck to run in the background but
this requires a lot of RAM and steals a lot IOPS from the other
applications running on the system. Running with journaled soft-updates
instead requires even more trust in notoriously lying disk, disk
controllers and their caches. Additionally UFS snapshots and journaled
soft-updates are incompatible and without snapshots you can't create
consistent backups of your file systems.
UFS is a great file system for the hardware it was designed for, but
hardware evolved and now we have to deal with orders of magnitude more
storage on disks which haven gotten a lot more reliable. There are still
use-cases for UFS and it is a good fit for small systems even if most of
these systems could use a NAND flash optimized file system as well.
-- Jan Bramkamp
More information about the freebsd-emulation
mailing list