Finding which GEOM provider is generating errors in a graid3
Jeremy Chadwick
koitsu at FreeBSD.org
Wed Aug 27 09:06:17 UTC 2008
On Wed, Aug 27, 2008 at 06:27:47PM +1000, Antony Mawer wrote:
> I have a FreeBSD 6.2-based server running a 1.2TB graid3 volume, which
> consists of 5x 320gb SATA hard drives. I've been getting errors in
> /var/log/messages from the graid3 volume, which I suspect means an
> underlying fault with one of the disks, but is there any way to decipher
> which one of these drives is throwing errors?
>
> I've checked smartctl -a /dev/adXX but nothing shows up there..
When you say "nothing shows up there", what exactly do you mean? A lot
of people don't know how to read SMART statistics. I hope by "nothing
shows up there" you mean "nothing stands out"
> I'm wondering if this is the infamous ata driver bug(s) that may be
> rearing its ugly head..
The bugs in question only apply when there's kernel messages coming from
the *disks themselves*, and not a GEOM provider. Your below dmesg
doesn't indicate there's any ATA errors, just GEOM errors. If the
disks were failing, you *would* be getting errors from the ATA
subsystem, but you're not.
I'm not familiar with GEOM "stuff", so I can't really comment on what
all is going on here.
> Also, does anyone know what "ZoneXXFailed" items in the graid3 list
> output mean?
>
> Relevant output:
>
> $ graid3 status Name Status Components raid3/data1 COMPLETE ad12
> ad14 ad16 ad18 ad20
>
> $ graid3 list Geom name: data1 State: COMPLETE Components: 5 Flags:
> VERIFY GenID: 0 SyncID: 1 ID: 3700500186 Zone64kFailed: 791239
> Zone64kRequested: 49197268 Zone16kFailed: 40204 Zone16kRequested:
> 1283738 Zone4kFailed: 12005939 Zone4kRequested: 2445799003 Providers:
> 1. Name: raid3/data1 Mediasize: 1280291731456 (1.2T) Sectorsize: 2048
> Mode: r1w1e1 ...
>
> $ atacontrol list ... ATA channel 6: Master: ad12 <ST3320620AS/3.AAK>
> Serial ATA v1.0 ATA channel 7: Master: ad14 <ST3320620AS/3.AAK> Serial
> ATA v1.0 ATA channel 8: Master: ad16 <ST3320620AS/3.AAK> Serial ATA
> v1.0 ATA channel 9: Master: ad18 <ST3320620AS/3.AAK> Serial ATA v1.0
> ATA channel 10: Master: ad20 <ST3320620AS/3.AAK> Serial ATA v1.0
>
>
> Output in /var/log/messages:
>
>> Aug 27 17:17:27 backup kernel:
>> g_vfs_done():raid3/data1[READ(offset=160320159744,
>> length=16384)]error = 5 Aug 27 17:25:45 backup kernel:
>> g_vfs_done():raid3/data1[READ(offset=160320159744,
>> length=16384)]error = 5 Aug 27 17:25:45 backup last message repeated
>> 7 times Aug 27 17:25:45 backup kernel:
>> g_vfs_done():raid3/data1[READ(offset=160320176128,
>> length=16384)]error = 5 Aug 27 17:25:45 backup last message repeated
>> 22 times Aug 27 17:25:45 backup kernel:
>> g_vfs_done():raid3/data1[READ(offset=160320192512,
>> length=16384)]error = 5 Aug 27 17:25:45 backup last message repeated
>> 21 times Aug 27 17:38:24 backup kernel:
>> g_vfs_done():raid3/data1[READ(offset=160320176128,
>> length=16384)]error = 5 Aug 27 17:38:26 backup last message repeated
>> 4 times Aug 27 17:46:02 backup kernel:
>> g_vfs_done():raid3/data1[READ(offset=160320159744,
>> length=16384)]error = 5 Aug 27 17:53:48 backup kernel:
>> g_vfs_done():raid3/data1[READ(offset=160320159744,
>> length=16384)]error = 5 Aug 27 17:53:48 backup last message repeated
>> 7 times Aug 27 17:53:48 backup kernel:
>> g_vfs_done():raid3/data1[READ(offset=160320176128,
>> length=16384)]error = 5 Aug 27 17:53:48 backup last message repeated
>> 22 times Aug 27 17:53:48 backup kernel:
>> g_vfs_done():raid3/data1[READ(offset=160320192512,
>> length=16384)]error = 5 Aug 27 17:53:49 backup last message repeated
>> 21 times
--
| Jeremy Chadwick jdc at parodius.com |
| Parodius Networking http://www.parodius.com/ |
| UNIX Systems Administrator Mountain View, CA, USA |
| Making life hard for others since 1977. PGP: 4BD6C0CB |
More information about the freebsd-stable
mailing list