System freeze: Adaptec (aac) timeouts (releng 8)
Dennis Koegel
dk at neveragain.de
Wed Sep 14 08:08:32 UTC 2011
Cheers,
we have a reproducible system freeze due to Adaptec driver (aac) timeouts:
Sep 3 05:26:44 foo kernel: aac0: COMMAND 0xffffff80005ae4c0 (TYPE 502) TIMEOUT AFTER 129 SECONDS
Sep 3 05:26:44 foo kernel: aac0: COMMAND 0xffffff80005ac0e0 (TYPE 502) TIMEOUT AFTER 129 SECONDS
Sep 3 05:26:44 foo kernel: aac0: COMMAND 0xffffff80005b0fa0 (TYPE 502) TIMEOUT AFTER 129 SECONDS
<dozens more of these...>
Once this happens, the userland seems to be alive, but the controller is
completely dead. As soon as the disk subsystem is involved, any process
hangs forever (e.g. SSH crypto-exchange still happens, but a shell won't
even start anymore).
We observe the same issue on two systems of (mostly) identical spec, so
it's not a hardware issue.
Apparently this only happens under heavy disk i/o and high cpu load.
Notably high write throughput plus a 'zpool scrub' on a large
GELI-backed zpool usually triggers the problem after a few hours.
Without high activity, they run smooth for weeks.
Both systems are amd64 with an Adaptec 5805 controller and 16 disks (of
which two form a RAID-1 system volume (UFS), and the remaining 14 serve
as JBOD for a large zpool -- a total of 15 "aacd" devices).
Both were running 8.2R originally. I've taken them to 8-STABLE now and
also applied svn r222951 (where the MFC was forgotten, it seems), but
the problem remains.
Any help is greatly appreciated.
Thanks,
- D.
More information about the freebsd-stable
mailing list