ata 'Flush Cache' errors, on non-failing disk?

Sun Jun 28 15:43:55 UTC 2009

Hi,

I've recently updated my amd64 system from 6.4 to 7.2-Stable - this works 
fine, but I've started picking up errors on the console:

  ad36: TIMEOUT - FLUSHCACHE retrying (1 retry left)

The drive (an WD5000AAKS) appears healthy - SMART reports no errors, or 
problems - and the timeouts only appear when that drive is 'being hammered' 
by write requests (e.g. during ZFS re-silvering to it)

The Western-Digi drive doctor CD/ISO runs a full test, and reports no 
problems (in that machine, with that drive).

I did find a number of posts, such as:

 <http://lists.freebsd.org/pipermail/freebsd-current/2009-April/005939.html>

Which point to the default timeout for the ATA flushcache command being 5 
seconds, when perhaps it should be 30...

But the code in 7.2-STABLE bears no resemblance to the code that the patch 
is for - so I'm guessing things have moved on since then...

Is there anywhere I might apply a similar patch to up the timeout to see if 
that cures the problem?

The only mentions of ATA_FLUSHCACHE appears to be calls to "ata_controlcmd( 
xxxx, ATA_FLUSHCACHE, 0, 0, 0);" - "ata_controlcmd" in turn seems to set a 
request timeout of '1' - but I can't tell if that's a timeout of 1 second, 
1 tick, or 1 what - or if it's a timeout for adding the command to the 
queue, or actually a timeout for executing that command...

Is upping that request timeout conditionally for cache flushes likely to 
have the effect I'm looking for?

-Kp