Apparently spurious ZFS CRC errors (was Re: ZFS data error
without reasons)
Mark Powell
M.S.Powell at salford.ac.uk
Wed Mar 25 06:49:13 PDT 2009
On Wed, 25 Mar 2009, Alexander Leidinger wrote:
>> Can prefetch really cause these problems? And if so why?
>
> I don't think so. I missed the part where you explained this before. In
> this case it's really the write cache. The interesting questions is if
> this is because of the harddisks you use, or because of a bug in the
> software.
>
> You run a very recent current? 1-2 weeks before there was a bug (not in
> ZFS) which caused CRC errors, but it was fixed shortly after it was
> noticed. If you haven't updated your system, it may be best to update it
> and try again. Please report back.
I'm running recent current. I too saw that there were bugs causing CRC
errors, and hoped that the relevant fixes would help me out. Unfortunately
not.
I most recently remade the whole array again with current from last
Thursday 19th March.
I tried it with WC disabled, but performance is awful. I expected,
obviously a little worse, but not to be noticable without benchmarks? Well
restoring my 1st LTO2 200GB tape (should take 1h45-2hrs), after 3h30 it
was only about halfway through the tape, so I gave up. Hoping, possibly in
vain, that it was a ZFS option causing the issue.
The drives in question are:
ad24 Device Model: WDC WD10EADS-00L5B1
ad22 Device Model: WDC WD10EADS-00L5B1
ad20 Device Model: WDC WD10EADS-00L5B1
ad18 Device Model: WDC WD10EADS-00L5B1
ad16 Device Model: WDC WD10EADS-00L5B1
ad14 Device Model: WDC WD10EADS-00L5B1
ad10 Device Model: WDC WD5000AAKS-22TMA0
ad8 Device Model: WDC WD5000AAKS-65TMA0
The WD5000AAKS were used for around 18 months in the previous 9x500GB
RAIDZ2 on 7, so I would expect them to be ok.
I've had the WD10EADS for about 2 months. However, I did replace the
drives in the old 9x500GB RAIDZ2, with each of the new drives to check
they were ok, resilvering them, one at a time, into the array i.e.
eventually I was running 3x500GB+6X1TB in the still logically 9x500GB
RAIDZ2. Yes, this would only check the lower 500GB of each 1TB drive, but
surely that's enough of a test?
AFAICT, I had WC off in 7 though.
On my most recent failure I do see:
-----
# zpool status -v
pool: pool
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: http://www.sun.com/msg/ZFS-8000-8A
scrub: scrub in progress, 40.02% done, 3h54m to go
config:
NAME STATE READ WRITE CKSUM
pool ONLINE 0 0 42
raidz2 ONLINE 0 0 42
stripe/str0 ONLINE 0 0 0
ad14 ONLINE 0 0 4
ad16 ONLINE 0 0 2
ad18 ONLINE 0 0 3
ad20 ONLINE 0 0 7
ad22 ONLINE 0 0 4
ad24 ONLINE 0 0 5
-----
i.e. no errors on the 2x500GB stripe. That would seem to suggest firmware
write caching bugs on the 1TB drives. However, my other error report had:
-----
pool: pool
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: http://www.sun.com/msg/ZFS-8000-9P
scrub: scrub completed after 0h51m with 0 errors on Fri Mar 20 10:57:18 2009
config:
NAME STATE READ WRITE CKSUM
pool ONLINE 0 0 0
raidz2 ONLINE 0 0 23
stripe/str0 ONLINE 0 0 489 12.3M repaired
ad14 ONLINE 0 0 786 19.7M repaired
ad16 ONLINE 0 0 804 20.1M repaired
ad18 ONLINE 0 0 754 18.8M repaired
ad20 ONLINE 0 0 771 19.3M repaired
ad22 ONLINE 0 0 808 20.2M repaired
ad24 ONLINE 0 0 848 21.2M repaired
errors: No known data errors
-----
i.e. errors on the stripe, but the stripe error count seems to be just
over half of that a 1TB drive. If the errors we spread evenly, one would
expect 2x the amount of CRC errors on the stripe?
>>> If you want to get more out of zfs, maybe vfs.zfs.vdev.max_pending could
>>> help if you are using SATA (as I read the zfs tuning guide, it makes sense
>>> to have a high value when you have command queueing, which we have with
>>> SCSI drives, but not yet with SATA drives and probably not at all with
>>> PATA drives).
>>
>> I'm running completely SATA with NCQ supporting drives. However, and
>> possibly as you say, NCQ is not really/properly supported in FBSD?
>
> NCQ is not supported yet in FreeBSD. Alexander Motin said he is interested in
> implementing it, but I don't know about the status of this.
Ok. So vfs.zfs.vdev.max_pending is irrelevant for SATA currently?
Cheers.
--
Mark Powell - UNIX System Administrator - The University of Salford
Information & Learning Services, Clifford Whitworth Building,
Salford University, Manchester, M5 4WT, UK.
Tel: +44 161 295 6843 Fax: +44 161 295 5888 www.pgp.com for PGP key
More information about the freebsd-current
mailing list