Re: ZFS deadlocks triggered by HDD timeouts

From: Alan Somers <asomers_at_freebsd.org>
Date: Wed, 01 Dec 2021 20:28:14 UTC
On Wed, Dec 1, 2021 at 11:25 AM Warner Losh <imp@bsdimp.com> wrote:
>
>
>
> On Wed, Dec 1, 2021, 11:16 AM Alan Somers <asomers@freebsd.org> wrote:
>>
>> On a stable/13 build from 16-Sep-2021 I see frequent ZFS deadlocks
>> triggered by HDD timeouts.  The timeouts are probably caused by
>> genuine hardware faults, but they didn't lead to deadlocks in
>> 12.2-RELEASE or 13.0-RELEASE.  Unfortunately I don't have much
>> additional information.  ZFS's stack traces aren't very informative,
>> and dmesg doesn't show anything besides the usual information about
>> the disk timeout.  I don't see anything obviously related in the
>> commit history for that time range, either.
>>
>> Has anybody else observed this phenomenon?  Or does anybody have a
>> good way to deliberately inject timeouts?  CAM makes it easy enough to
>> inject an error, but not a timeout.  If it did, then I could bisect
>> the problem.  As it is I can only reproduce it on production servers.
>
>
> What SIM? Timeouts are tricky because they have many sources, some of which are nonlocal...
>
> Warner

mpr(4)