[Bug 229745] ahcich: CAM status: Command timeout
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Thu, 08 Feb 2024 17:43:27 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229745 --- Comment #76 from Warner Losh <imp@FreeBSD.org> --- (In reply to Kevin Zheng from comment #75) >The issue that I'm writing about is the system behavior. It seemed that all I/O (or maybe just writes?) to the ZFS pool were stalled waiting of the disk to time out and reattach, despite the fact that I have other working mirror devices. It seems to me that one hardware issue with one disk shouldn't stall the whole pool. > > I'm not actually sure if this problem is happening on the ZFS level or in CAM or the SATA subsystem; if this happens again, what debugging steps would determine what the cause of this is? Yes. So there's a few things going on here. First, ZFS has ordering issues that it undertakes to enforce by scheduling some I/O (especially writes) only after I/O it depends on has completed. The ZFS code ensures that the state of its log is always in a reasonable state by this means. That means that if some I/O hangs for a "long period of time (more than a second or five)." then that would delay the I/O that depend on that completing as well. This could have the effect of causing processes to hang waiting for that I/O to complete. So while I'd agree that one misbehaving disk shouldn't hang the pool, I can see how it might. How can ZFS know what to schedule, consistent with its desire to keep the law consistent, if any disk could suddenly stop writing? Now, I'm not a ZFS expert enough to know if one of its goals is to cope with this situation. I'd check with the ZFS developers to see if they'd expect ZFS to not stall if one disk stalls for a long time. ZFS does try to pipeline its stream of I/Os as well, as much as possible, and one stalling disk interferes with that pipeline. One way to mitigate this, however, could be to set the timeout down from 30s to something smaller like 3-5s (ssd) or 8-12s (hdd). And the number of retires down to 2 (it has to be greater than 1 for most controllers due to deficiencies in their recovery protocols, which are kinda hard to fix). That could help keep the hangs down from 90s down to more like 5-10s (ssd) or 15-20s (hdd) which would be less noticeable in a wide range of workloads (though certainly not all). There may be ZFS specific tunings that you might be able to try if this happens often. Maybe smaller (or paradoxically larger) I/Os by creating the pools with a smaller logical block size (ashift). This might help align the I/O to the physical NAND blocks better (hence maybe bigger is needed). Also partitioning the drive such that it starts on a good LBA boundary (I often keep 1MB at the start of disks unused because that's still < physical block sizes, but also a trivial amount... I expect to bump this to 8M or 16MB in the future). That might help keep whatever bug / pathology that's in the drive leading to the hangs to not occur (though there's no guarantee: maybe it's due to a bug in the firmware that's impossible to avoid). -- You are receiving this mail because: You are the assignee for the bug.