Re: 13-stable NFS server hang

From: Garrett Wollman <wollman_at_bimajority.org>
Date: Sun, 03 Mar 2024 05:25:18 UTC
<<On Sat, 2 Mar 2024 23:28:20 -0500, I wrote:

> I believe this explains why vn_copy_file_range sometimes takes much
> longer than a second: our servers often have lots of data waiting to
> be written to disk, and if the file being copied was recently modified
> (and so is dirty), this might take several seconds.  I've set
> vfs.zfs.dmu_offset_next_sync=0 on the server that was hurting the most
> and am watching to see if we have more freezes.

In case anyone is wondering why this is an issue, it's the combination
of two factors:

1) vn_generic_copy_file_range() attempts to preserve holes in the
source file.

2) ZFS does automatic hole-punching on write for filesystems where
compression is enabled.  It happens in the same code path as
compression, checksum generation, and redundant-write suppression, and
thus does not happen until the dirty blocks are about to be committed
to disk.  So if the file is dirty, ZFS doesn't "know" whether thare
where the then-extant holes are until a sync has completed.

While vn_generic_copy_file_range() has a flag to stop and return
partial success after a second of copying, this flag does not affect
sleeps internal to the filesystem, so zfs_holey() can sleep
indefinitely and vn_generic_copy_file_range() can't do anything about
it until the sync has already happened.

-GAWollman