ad10: WARNING - READ_DMA UDMA ICRC error (retrying request)
LBA=11441599
O. Hartmann
ohartman at mail.uni-mainz.de
Tue Aug 9 08:23:36 GMT 2005
Mike Tancsa wrote:
> At 08:25 PM 08/08/2005, O. Hartmann wrote:
>
>> Hello.
>>
>> My box is a FreeBSD 6.0-BETA2 driven ASUS A8N-SLI Deluxe based AMD64
>> boxed (see dmesg).
>> One of my SATA disks, the SAMSUNG SP2004C seems to show errors during
>> operation (and also showd under 5.4-RELEASE-p3).
>> Sometimes I get this error:
>> ad10: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=11441599
>> while the machine still keeps working.
>> Other days the box crashes completely.
>>
>> Is this a operating system bug or is this message an evidence of
>> defective hardware?
>
>
> You can probably confirm a hardware issue with the smartmon tools.
> (/usr/ports/sysutils/smartmontools).
>
> It was quite handy the other day for us to narrow down a problem between
> a drive tray and the actual drive. We started to see
>
> Aug 3 02:02:49 verify1 kernel: ad0: TIMEOUT - READ_DMA retrying (2
> retries left) LBA=391423
> Aug 3 02:03:00 verify1 kernel: ad0: TIMEOUT - READ_DMA retrying (2
> retries left) LBA=2304319
> Aug 3 02:03:10 verify1 kernel: ad0: TIMEOUT - READ_DMA retrying (2
> retries left) LBA=2312927
> Aug 3 02:03:17 verify1 kernel: ad0: TIMEOUT - READ_DMA retrying (2
> retries left) LBA=2308639
> Aug 3 02:03:26 verify1 kernel: ad0: TIMEOUT - READ_DMA retrying (2
> retries left) LBA=2309855
> Aug 3 02:03:37 verify1 kernel: ad0: TIMEOUT - READ_DMA retrying (2
> retries left) LBA=2348359
> Aug 4 12:12:37 verify1 kernel: ad0: TIMEOUT - READ_DMA retrying (2
> retries left) LBA=1528639
> Aug 4 12:13:04 verify1 kernel: ad0: TIMEOUT - READ_DMA retrying (2
> retries left) LBA=1530031
> Aug 4 12:13:04 verify1 kernel: ad0: TIMEOUT - READ_DMA retrying (1
> retry left) LBA=1528639
> Aug 4 12:13:04 verify1 kernel: ad0: FAILURE - READ_DMA timed out
> Aug 4 12:13:04 verify1 kernel: spec_getpages:(ad0s1a) I/O read failure:
> (error=5) bp 0xd630b4fc vp 0xc2640d68
>
> Yet when we read the actual error info off the drive via smartctl -a
> ad0, it was clean. So it pointed to the drive tray which we swapped and
> all was well. In other situations however, the smart info will often
> tell you if the drive is starting to fail. Its not 100% reliable, but
> since we started using it, it generally gave us some sort of heads up as
> to whether or not a drive is in trouble.
>
>
> ---Mike
Dear Mike.
Thanks a lot for this info.
I will use this tool and try to report what I found out.
I also use trays for my drives (like I did with SCSI and SCA2 on our
servers at the lab). Maybe this could be an issue.
Oliver
More information about the freebsd-stable
mailing list