Re: 13-stable NFS server hang

From: Rick Macklem <rick.macklem_at_gmail.com>
Date: Sun, 03 Mar 2024 21:14:44 UTC
On Sat, Mar 2, 2024 at 9:25 PM Garrett Wollman <wollman@bimajority.org> wrote:
>
> <<On Sat, 2 Mar 2024 23:28:20 -0500, I wrote:
>
> > I believe this explains why vn_copy_file_range sometimes takes much
> > longer than a second: our servers often have lots of data waiting to
> > be written to disk, and if the file being copied was recently modified
> > (and so is dirty), this might take several seconds.  I've set
> > vfs.zfs.dmu_offset_next_sync=0 on the server that was hurting the most
> > and am watching to see if we have more freezes.
>
> In case anyone is wondering why this is an issue, it's the combination
> of two factors:
>
> 1) vn_generic_copy_file_range() attempts to preserve holes in the
> source file.
Just fyi, when I was first doing the copy_file_range(2) syscall, the discussion
seemed to think this was a reasonable thing to do.
It is now not so obvious for file systems doing compression, such as ZFS.

It happens that ZFS will no longer use vn_generic_copy_file_range() when
block cloning is enabled and I have no idea what block cloning does w.r.t.
preserving holes.

For non-compression file systems, comparing va_size with va_bytes should
serve as a reasonable hint w.r.t. the file being sparse. If the file
is not sparse,
vn_generic_copy_file_range() should not bother doing SEEK_DATA/SEEK_HOLE.
(I had intended to do such a patch, but I cannot now remember if I did do so.
I'll take a look.)
Note that this patch would not affect ZFS, but could improve UFS performaince
where vn_generic_copy_file_range() is used to do the copying.

rick

>
> 2) ZFS does automatic hole-punching on write for filesystems where
> compression is enabled.  It happens in the same code path as
> compression, checksum generation, and redundant-write suppression, and
> thus does not happen until the dirty blocks are about to be committed
> to disk.  So if the file is dirty, ZFS doesn't "know" whether thare
> where the then-extant holes are until a sync has completed.
>
> While vn_generic_copy_file_range() has a flag to stop and return
> partial success after a second of copying, this flag does not affect
> sleeps internal to the filesystem, so zfs_holey() can sleep
> indefinitely and vn_generic_copy_file_range() can't do anything about
> it until the sync has already happened.
>
> -GAWollman
>