Re: 13-stable NFS server hang
- Reply: Rick Macklem : "Re: 13-stable NFS server hang"
- In reply to: Garrett Wollman : "Re: 13-stable NFS server hang"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Sun, 03 Mar 2024 05:25:18 UTC
<<On Sat, 2 Mar 2024 23:28:20 -0500, I wrote: > I believe this explains why vn_copy_file_range sometimes takes much > longer than a second: our servers often have lots of data waiting to > be written to disk, and if the file being copied was recently modified > (and so is dirty), this might take several seconds. I've set > vfs.zfs.dmu_offset_next_sync=0 on the server that was hurting the most > and am watching to see if we have more freezes. In case anyone is wondering why this is an issue, it's the combination of two factors: 1) vn_generic_copy_file_range() attempts to preserve holes in the source file. 2) ZFS does automatic hole-punching on write for filesystems where compression is enabled. It happens in the same code path as compression, checksum generation, and redundant-write suppression, and thus does not happen until the dirty blocks are about to be committed to disk. So if the file is dirty, ZFS doesn't "know" whether thare where the then-extant holes are until a sync has completed. While vn_generic_copy_file_range() has a flag to stop and return partial success after a second of copying, this flag does not affect sleeps internal to the filesystem, so zfs_holey() can sleep indefinitely and vn_generic_copy_file_range() can't do anything about it until the sync has already happened. -GAWollman