Re: general zfs/zvol question

From: Frank Leonhardt <freebsd-doc_at_fjl.co.uk>
Date: Tue, 15 Feb 2022 11:23:38 UTC
On 09/02/2022 02:10, tech-lists wrote:
> Hello experts,
>
> What would you think would run faster, all else being equal:
>
> 1. a freebsd vm, the image being file-backed (i.e. the empty vm image 
> initially created with say "truncate -s 64g freebsdvm.img) on a 
> freebsd zfs filesystem on the host, and the installed filesystem on 
> the guest being UFS, or
>
> 2. a freebsd vm, the image being zfs-backed (ie a zvol) and the 
> installed filesystem on the guest being UFS, or
>
> 3. the same as example 2, the guest filesystem being zfs?
>
> for context this is recent stable/13, so OpenZFS, and the guest vm 
> would have 4*vCPU and 16GB vRAM
>
> I *think* example #2 would be fastest, if the underlying zfs vol was 
> also using zstd compression. But I'm only guessing, and it's not a 
> qualified
> guess. I'm wondering if anyone has done this and what their 
> impressions were.
>
> thanks,

An intersecting question and not one I'd like to give a definite answer 
to. The best option would be to run FreeBSD in a jail on FreeBSD of 
course ;-)

Don't fall into the trap of thinking that a zvol is any better than a 
file. You'd assume it was, but I'm pretty sure it's just a file 
allocating space on the zpool like a dataset (CoW). The realisation left 
me wondering what the point of a zvol was, and a decade later I'm still 
wondering. (Disclaimer: I haven't checked recently). That makes 1+2 the 
same.

The other important thing to be clear about is that ZFS is NOT fast. The 
extent to which it's not fast depends on the hardware behind it. The 
main problem is the extreme fragmentation caused by copy-on-write. If 
your data largely static (user files) it's not a big deal. If it's a 
database, with millions of random access writes to the same file, that 
file's going to be spread all over the zpool and there is nothing you 
can do to stop it. If you're using SSDs or 128Gb of cache you may not 
notice the difference in access times, but by this stage you're not 
comparing like for like.

Operating systems like Windoze are always making small writes to disks 
and IME end up fragmented in the same way a database would.

So, unless you've got a large drive array and loads of RAM for cache, 
I'd really try to put the VMs on top of UFS, as keeping data on adjacent 
cylinders means less to cache, less head movement and more chance of the 
data you want being in core already. Running ZFS on top of UFS is close 
to running it native on a single disk vdev (based on my experiments).

Incidentally, not directly relevant to your question, but here's 
something I discovered using FreeBSD/ZFS to back ESXi VMs.

https://blog.frankleonhardt.com/2017/esxi-nfs-zfs-and-vfs-nfsd-async/

Regards, Frank.