Apparently spurious ZFS CRC errors (was Re: ZFS data error without reasons)

Wed Mar 25 06:49:13 PDT 2009

On Wed, 25 Mar 2009, Alexander Leidinger wrote:

>>  Can prefetch really cause these problems? And if so why?
>
> I don't think so. I missed the part where you explained this before. In 
> this case it's really the write cache. The interesting questions is if 
> this is because of the harddisks you use, or because of a bug in the 
> software.
>
> You run a very recent current? 1-2 weeks before there was a bug (not in 
> ZFS) which caused CRC errors, but it was fixed shortly after it was 
> noticed. If you haven't updated your system, it may be best to update it 
> and try again. Please report back.

I'm running recent current. I too saw that there were bugs causing CRC 
errors, and hoped that the relevant fixes would help me out. Unfortunately 
not.
   I most recently remade the whole array again with current from last 
Thursday 19th March.
   I tried it with WC disabled, but performance is awful. I expected, 
obviously a little worse, but not to be noticable without benchmarks? Well 
restoring my 1st LTO2 200GB tape (should take 1h45-2hrs), after 3h30 it 
was only about halfway through the tape, so I gave up. Hoping, possibly in 
vain, that it was a ZFS option causing the issue.
   The drives in question are:

ad24    Device Model:     WDC WD10EADS-00L5B1
ad22    Device Model:     WDC WD10EADS-00L5B1
ad20    Device Model:     WDC WD10EADS-00L5B1
ad18    Device Model:     WDC WD10EADS-00L5B1
ad16    Device Model:     WDC WD10EADS-00L5B1
ad14    Device Model:     WDC WD10EADS-00L5B1
ad10    Device Model:     WDC WD5000AAKS-22TMA0
ad8     Device Model:     WDC WD5000AAKS-65TMA0

   The WD5000AAKS were used for around 18 months in the previous 9x500GB 
RAIDZ2 on 7, so I would expect them to be ok.
   I've had the WD10EADS for about 2 months. However, I did replace the 
drives in the old 9x500GB RAIDZ2, with each of the new drives to check 
they were ok, resilvering them, one at a time, into the array i.e. 
eventually I was running 3x500GB+6X1TB in the still logically 9x500GB 
RAIDZ2. Yes, this would only check the lower 500GB of each 1TB drive, but 
surely that's enough of a test?
   AFAICT, I had WC off in 7 though.
   On my most recent failure I do see:

-----
# zpool status -v
   pool: pool
  state: ONLINE
status: One or more devices has experienced an error resulting in data
         corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
         entire pool from backup.
    see: http://www.sun.com/msg/ZFS-8000-8A
  scrub: scrub in progress, 40.02% done, 3h54m to go
config:

         NAME             STATE     READ WRITE CKSUM
         pool             ONLINE       0     0    42
           raidz2         ONLINE       0     0    42
             stripe/str0  ONLINE       0     0     0
             ad14         ONLINE       0     0     4
             ad16         ONLINE       0     0     2
             ad18         ONLINE       0     0     3
             ad20         ONLINE       0     0     7
             ad22         ONLINE       0     0     4
             ad24         ONLINE       0     0     5
-----

i.e. no errors on the 2x500GB stripe. That would seem to suggest firmware 
write caching bugs on the 1TB drives. However, my other error report had:

-----
   pool: pool
  state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
         attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
         using 'zpool clear' or replace the device with 'zpool replace'.
    see: http://www.sun.com/msg/ZFS-8000-9P
  scrub: scrub completed after 0h51m with 0 errors on Fri Mar 20 10:57:18 2009
config:

         NAME             STATE     READ WRITE CKSUM
         pool             ONLINE       0     0     0
           raidz2         ONLINE       0     0    23
             stripe/str0  ONLINE       0     0   489  12.3M repaired
             ad14         ONLINE       0     0   786  19.7M repaired
             ad16         ONLINE       0     0   804  20.1M repaired
             ad18         ONLINE       0     0   754  18.8M repaired
             ad20         ONLINE       0     0   771  19.3M repaired
             ad22         ONLINE       0     0   808  20.2M repaired
             ad24         ONLINE       0     0   848  21.2M repaired

errors: No known data errors
-----

i.e. errors on the stripe, but the stripe error count seems to be just 
over half of that a 1TB drive. If the errors we spread evenly, one would 
expect 2x the amount of CRC errors on the stripe?

>>> If you want to get more out of zfs, maybe vfs.zfs.vdev.max_pending could 
>>> help if you are using SATA (as I read the zfs tuning guide, it makes sense 
>>> to have a high value when you have command queueing, which we have with 
>>> SCSI drives, but not yet with SATA drives and probably not at all with 
>>> PATA drives).
>> 
>> I'm running completely SATA with NCQ supporting drives. However, and 
>> possibly as you say, NCQ is not really/properly supported in FBSD?
>
> NCQ is not supported yet in FreeBSD. Alexander Motin said he is interested in 
> implementing it, but I don't know about the status of this.

Ok. So vfs.zfs.vdev.max_pending is irrelevant for SATA currently?
   Cheers.

-- 
Mark Powell - UNIX System Administrator - The University of Salford
Information & Learning Services, Clifford Whitworth Building,
Salford University, Manchester, M5 4WT, UK.
Tel: +44 161 295 6843  Fax: +44 161 295 5888  www.pgp.com for PGP key