Re: ZFS deadlocks triggered by HDD timeouts

From: Warner Losh <imp_at_bsdimp.com>
Date: Wed, 01 Dec 2021 18:24:57 UTC
On Wed, Dec 1, 2021, 11:16 AM Alan Somers <asomers@freebsd.org> wrote:

> On a stable/13 build from 16-Sep-2021 I see frequent ZFS deadlocks
> triggered by HDD timeouts.  The timeouts are probably caused by
> genuine hardware faults, but they didn't lead to deadlocks in
> 12.2-RELEASE or 13.0-RELEASE.  Unfortunately I don't have much
> additional information.  ZFS's stack traces aren't very informative,
> and dmesg doesn't show anything besides the usual information about
> the disk timeout.  I don't see anything obviously related in the
> commit history for that time range, either.
>
> Has anybody else observed this phenomenon?  Or does anybody have a
> good way to deliberately inject timeouts?  CAM makes it easy enough to
> inject an error, but not a timeout.  If it did, then I could bisect
> the problem.  As it is I can only reproduce it on production servers.
>

What SIM? Timeouts are tricky because they have many sources, some of which
are nonlocal...

Warner

>