SMART: disk problems on RAIDZ1 pool: (ada6:ahcich6:0:0:0): CAMstatus: ATA Status Error

Wed Dec 13 22:07:37 UTC 2017

> On 13 Dec 2017, at 21:39, O. Hartmann <o.hartmann at walstatt.org> wrote:
> 
> Am Wed, 13 Dec 2017 08:47:53 -0800 (PST)
> "Rodney W. Grimes" <freebsd-rwg at pdx.rh.CN85.dnsmgr.net> schrieb:
> 
>>> On Tue, 12 Dec 2017 14:58:28 -0800
>>> Cy Schubert <Cy.Schubert at komquats.com> wrote:
>>> 
>>>> There are a couple of ways you can address this. You'll need to
>>>> offline the vdev first. If you've done a smartcrl -t long and if the
>>>> test failed, smartcrl -a will tell you which block it had an issue
>>>> with. You can use dd, ddrescue or dd_rescue to dd the block over
>>>> itself. The drive may rewrite the (weak) block or if it fails to it
>>>> will remap it (subsequently showing as reallocated).
>>>> 
>>>> Of course there is a risk. If the sector is any of the boot blocks
>>>> there is a good chance the server will hang.  
>>> 
>>> The drive is part of a dedicated storage-only pool. The boot drive is a
>>> fast SSD. So I do not care about this - well, to say it more politely:
>>> I do not have to take care of that aspect.
>>> 
>>>> 
>>>> You have to be *absolutely* sure which the bad sector is. And, there
>>>> may be more. There is a risk of data loss.
>>>> 
>>>> I've used this technique many times. Most times it works perfectly.
>>>> Other times the affected file is lost but the rest of the file system
>>>> is recovered. And again there is always the risk.
>>>> 
>>>> Replace the disk immediately if you experience a growing succession
>>>> of pending sectors. Otherwise replace the disk at your earliest
>>>> convenience.  
>>> 
>>> The ZFS scrubbing of the volume ended this morning, leaving the pool in
>>> a healthy state. After reboot, there was no sign of CAM errors again.
>>> 
>>> But there is something else I'm worried about. The mainboard I use is a 
>>> 
>>> ASRock Z77 Pro4-M.
>>> The board has a cripple Intel MCP with 6 SATA ports from the chipset,
>>> two of them SATA 6GB, 4 SATA II, and one additional chip with two SATA
>>> 6GB ports:
>>> 
>>> [...]
>>> ahci0 at pci0:2:0:0:       class=0x010601 card=0x06121849 chip=0x06121b21
>>> rev=0x01 hdr=0x00 vendor     = 'ASMedia Technology Inc.'
>>>    device     = 'ASM1062 Serial ATA Controller'
>>>    class      = mass storage
>>>    subclass   = SATA
>>>    bar   [10] = type I/O Port, range 32, base 0xe050, size 8, enabled
>>>    bar   [14] = type I/O Port, range 32, base 0xe040, size 4, enabled
>>>    bar   [18] = type I/O Port, range 32, base 0xe030, size 8, enabled
>>>    bar   [1c] = type I/O Port, range 32, base 0xe020, size 4, enabled
>>>    bar   [20] = type I/O Port, range 32, base 0xe000, size 32, enabled
>>>    bar   [24] = type Memory, range 32, base 0xf7b00000, size 512,
>>>    enabled
>>> [...]
>>> 
>>> Attached to that ASM1062 SATA chip, is a backup drive via eSATA
>>> connector, a WD 4 TB RED drive. It seems, whenever I attach this drive
>>> and it is online, I experience problems on the ZFS pool, which is
>>> attached to the MCP SATA ports.  
>> 
>> How does this external drive get its power?  Are the earth grounds of
>> both the system and the external drive power supply closely tied
>> togeather?  A plug/unplug event with a slight ground creep can
>> wreck havioc with device operation.
> 
> The external drive is housed in a external casing. Its PSU is de facto with the same
> "grounding" (earth ground) as the server's PSU, they share the same power plug at its
> point were the plug is comeing out of the wall - so to speak.

Most external drive power supplies are not grounded. At least none I ever saw had grounded plugs for the mains cable. Might be, yours has it...

Worth checking anyway.

Daniel