Re: frequent disk error, need guidance

From: Polytropon <freebsd_at_edvax.de>
Date: Sat, 15 Apr 2023 05:33:15 UTC
On Fri, 14 Apr 2023 20:53:05 -0700, Gary Aitken wrote:
> I'm seeing a boatload of the same error:
>    (ada0:ata2:0:0:0): READ_DMA. ACB: c8 00 e2 c7 73 41 00 00 00 00 40 00
>    (ada0:ata2:0:0:0): CAM status: ATA Status Error
>    (ada0:ata2:0:0:0): ATA status: 51 (DRDY SERV ERR), error: 40 (UNC )
>    (ada0:ata2:0:0:0): RES: 51 40 e7 c7 73 01 01 00 00 00 00
>    (ada0:ata2:0:0:0): Retrying command, 3 more tries remain
> repeated, with occasional:
>    g_vfs_done():ada0p2[READ(offset=12474351616, length=32768)]error = 5
> 
> # smartctl --info /dev/da0
>    Model Family:     Seagate Barracuda 7200.9
>    Device Model:     ST3808110AS
>    Serial Number:    4LR1HW1E
>    Firmware Version: 3.ADH
>    User Capacity:    80,000,000,000 bytes [80.0 GB]
>    Sector Size:      512 bytes logical/physical
>    Device is:        In smartctl database 7.3/5319
>    ATA Version is:   ATA/ATAPI-7 (minor revision not indicated)
>    Local Time is:    Fri Apr 14 09:43:01 2023 MDT
>    SMART support is: Available - device has SMART capability.
>    SMART support is: Enabled
> # smartctl --health /dev/da0
>    SMART overall-health self-assessment test result: PASSED
> 
> # smartctl --test=long /dev/ada0
> # smartctl --log=selftest /dev/ada0
> Num  Test_Description                          Remaining  LBA_of_1st_error
>                         Status                        LifeTime(hours)
> 
> # 1  Extended offline  Completed: read failure  90%  7482  24365031
> # 2  Short offline     Completed: read failure  90%  7482  24365031
> # 3  Short offline     Completed: read failure  90%  7482  24365031
> # 4  Short offline     Completed without error  00%     0  -
> 
> So I presume a bad block/sector on the disk.

Probably too many bad blocks. The disks's firmware will
remap defective blocks to spare ones, and as soon as you
receive errors on OS level, it ran out of spare blocks.
This means it is not the beginning of a problem, but
the problem now is significant, and the disk probably
has arrived its end of life.

There is another option: Check all cables. Power to
be sure, but data is most important. In worst case,
try replacing the data cable. Check that it sits
as inteded on both sides. Yes - sometimes it is
that simple. ;-)



> I had high hopes this article:
>    https://www.freebsddiary.org/smart-fixing-bad-sector.php
> would show the way, but it seems to quit right at the good stuff.
> 
> Can it be remapped, and if so, pointers to how?

As I said, the disk will do that by itself, internally.
However, you _can_ use the "badblocks" utility for
diagnostics, along with "smartctl" (SMARTmon tools).
On OS level, you cannot really fix hardware problems
though.

Anyway: Make sure to backup your data and prepare
to replace the disk, it's probably the safest thing
to do (after you've ruled out bad cabling, that is).





-- 
Polytropon
Magdeburg, Germany
Happy FreeBSD user since 4.0
Andra moi ennepe, Mousa, ...