ZFS deadlocks triggered by HDD timeouts

From: Alan Somers <asomers_at_freebsd.org>
Date: Wed, 01 Dec 2021 18:15:14 UTC
On a stable/13 build from 16-Sep-2021 I see frequent ZFS deadlocks
triggered by HDD timeouts.  The timeouts are probably caused by
genuine hardware faults, but they didn't lead to deadlocks in
12.2-RELEASE or 13.0-RELEASE.  Unfortunately I don't have much
additional information.  ZFS's stack traces aren't very informative,
and dmesg doesn't show anything besides the usual information about
the disk timeout.  I don't see anything obviously related in the
commit history for that time range, either.

Has anybody else observed this phenomenon?  Or does anybody have a
good way to deliberately inject timeouts?  CAM makes it easy enough to
inject an error, but not a timeout.  If it did, then I could bisect
the problem.  As it is I can only reproduce it on production servers.

-Alan