Re: frequent disk error, need guidance

From: Paul Procacci <pprocacci_at_gmail.com>
Date: Sat, 15 Apr 2023 03:57:01 UTC
On Fri, Apr 14, 2023 at 11:54 PM Gary Aitken <freebsd@dreamchaser.org>
wrote:

> I'm seeing a boatload of the same error:
>    (ada0:ata2:0:0:0): READ_DMA. ACB: c8 00 e2 c7 73 41 00 00 00 00 40 00
>    (ada0:ata2:0:0:0): CAM status: ATA Status Error
>    (ada0:ata2:0:0:0): ATA status: 51 (DRDY SERV ERR), error: 40 (UNC )
>    (ada0:ata2:0:0:0): RES: 51 40 e7 c7 73 01 01 00 00 00 00
>    (ada0:ata2:0:0:0): Retrying command, 3 more tries remain
> repeated, with occasional:
>    g_vfs_done():ada0p2[READ(offset=12474351616, length=32768)]error = 5
>
> # smartctl --info /dev/da0
>    Model Family:     Seagate Barracuda 7200.9
>    Device Model:     ST3808110AS
>    Serial Number:    4LR1HW1E
>    Firmware Version: 3.ADH
>    User Capacity:    80,000,000,000 bytes [80.0 GB]
>    Sector Size:      512 bytes logical/physical
>    Device is:        In smartctl database 7.3/5319
>    ATA Version is:   ATA/ATAPI-7 (minor revision not indicated)
>    Local Time is:    Fri Apr 14 09:43:01 2023 MDT
>    SMART support is: Available - device has SMART capability.
>    SMART support is: Enabled
> # smartctl --health /dev/da0
>    SMART overall-health self-assessment test result: PASSED
>
> # smartctl --test=long /dev/ada0
> # smartctl --log=selftest /dev/ada0
> Num  Test_Description                          Remaining  LBA_of_1st_error
>                         Status                        LifeTime(hours)
>
> # 1  Extended offline  Completed: read failure  90%  7482  24365031
> # 2  Short offline     Completed: read failure  90%  7482  24365031
> # 3  Short offline     Completed: read failure  90%  7482  24365031
> # 4  Short offline     Completed without error  00%     0  -
>
> So I presume a bad block/sector on the disk.
> I had high hopes this article:
>    https://www.freebsddiary.org/smart-fixing-bad-sector.php
> would show the way, but it seems to quit right at the good stuff.
>
> Can it be remapped, and if so, pointers to how?
>
> Thanks,
>
> Gary
>
>
That is a hardware error. UNC means uncorrectable data error. Either you
have a cable going bad or that drive is failing.
Maybe it's just intermittent at this stage, but I'd look at trying a new
cable/replacing that drive as the very first step.

~Paul

-- 
__________________

:(){ :|:& };: