ZFS deadlocks triggered by HDD timeouts
- Reply: Warner Losh : "Re: ZFS deadlocks triggered by HDD timeouts"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Wed, 01 Dec 2021 18:15:14 UTC
On a stable/13 build from 16-Sep-2021 I see frequent ZFS deadlocks triggered by HDD timeouts. The timeouts are probably caused by genuine hardware faults, but they didn't lead to deadlocks in 12.2-RELEASE or 13.0-RELEASE. Unfortunately I don't have much additional information. ZFS's stack traces aren't very informative, and dmesg doesn't show anything besides the usual information about the disk timeout. I don't see anything obviously related in the commit history for that time range, either. Has anybody else observed this phenomenon? Or does anybody have a good way to deliberately inject timeouts? CAM makes it easy enough to inject an error, but not a timeout. If it did, then I could bisect the problem. As it is I can only reproduce it on production servers. -Alan