Fwd: frequent timeouts with mvs(4) SATA controller, GELI, and ZFS

Alan Somers asomers at freebsd.org
Sun Dec 11 20:09:16 UTC 2016


I was afraid you'd say something like that.  Sadly, disabling NCQ
didn't help.  For good measure, I tried disabling interrupt coalescing
too, but that didn't help either.  The error message did change
slightly: the iec field is now zero.

mvsch2: Timeout on slot 0
mvsch2: iec 00000000 sstat 00000123 serr 00000000 edma_s 000000c0
dma_c 20000700 dma_s 00000008 rs 00000001 status 50
(ada1:mvsch2:0:0:0): WRITE_DMA. ACB: ca 00 18 72 60 49 00 00 00 00 00 00
(ada1:mvsch2:0:0:0): CAM status: Command timeout
(ada1:mvsch2:0:0:0): Retrying command
mvsch0: Timeout on slot 0

Eventually I get a "Retry was blocked" error like this, but the CAM
Status is always "Command timeout".
mvsch0: Timeout on slot 0
mvsch0: iec 00000000 sstat 00000123 serr 00000000 edma_s 00001140
dma_c 00000000 dma_s 00000008 rs 00000001 status 58
(aprobe1:mvsch0:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00
(aprobe1:mvsch0:0:0:0): CAM status: Command timeout
(aprobe1:mvsch0:0:0:0): Error 5, Retry was blocked

What's your recommendation?  Is there anyway to make this hardware
work, or do I need to buy a new SATA card?  That would be a
disappointment.  The 88SX7042 got generally positive reviews.

-Alan

On Sun, Dec 11, 2016 at 2:44 AM, Alexander Motin <mav at freebsd.org> wrote:
> This controller uses Marvell proprietary API, and alike to most of their
> products is not publicly documented.  This family of chips also known
> for long errata history, which is also not publicly documented.  In
> addition to that, this line of chips is discontinued for years since
> Marvell switched to new line of AHCI compatible 6Gbps chips.
>
> "iec 02000000" means device error reported by EDMA engine.  It should be
> properly handled, not causing timeouts, but it seems something went
> wrong.  Either chip forgot to generate the interrupt, or driver did
> something wrong about it.
>
> As workaround you may try to disable NCQ for those drives using
> `camcontrol negotiate` and see what happen.  May be that allow you to
> see some real error reported by the drive or at least allow error recovery.
>
> On 11.12.2016 02:03, Alan Somers wrote:
>> I have an 11.0-RELEASE machine with a Via Nano CPU and a Marvell SATA
>> 88SX7042 controller.  I have a GELI-encrypted triple-mirror zpool with
>> disks on that controller.  But the number doesn't matter; I have the
>> same problems even when only one disk is connected.  Whenever I write
>> to this pool, after a few GB of writes I get a timeout on one of the
>> mvs(4) slots, followed shortly by timeouts on every disk on that
>> controller.  From this point until I reboot, no command sent to any
>> disk on that controller will ever complete.  CAM tries to reprobe the
>> disks, fails, and their ada nodes disappear.  This is repeatable.
>> Does anybody have any ideas what's going on?
>> Anybody know any dirt about this SATA controller?
>>
>> pciconf -lv
>> ...
>> atapci0 at pci0:0:15:0:    class=0x01018f card=0xaa241106 chip=0x90011106 rev=0x00
>> hdr=0x00
>>     vendor     = 'VIA Technologies, Inc.'
>>     device     = 'VX900 Serial ATA Controller'
>>     class      = mass storage
>>     subclass   = ATA
>> mvs0 at pci0:1:0:0:        class=0x010000 card=0x11ab11ab chip=0x704211ab rev=0x02
>> hdr=0x00
>>     vendor     = 'Marvell Technology Group Ltd.'
>>     device     = '88SX7042 PCI-e 4-port SATA-II'
>>     class      = mass storage
>>     subclass   = SCSI
>> ...
>>
>> dmesg
>> ...
>> mvsch3: Timeout on slot 7
>> mvsch3: iec 02000000 sstat 00000123 serr 00000000 edma_s 000000e1
>> dma_c 20000708 dma_s 00000008 rs 000000f2 status 40
>> mvsch3:  ... waiting for slots 00000072
>> mvsch3: Timeout on slot 6
>> mvsch3: iec 02000000 sstat 00000123 serr 00000000 edma_s 000000e1
>> dma_c 20000708 dma_s 00000008 rs 000000f2 status 40
>> mvsch3:  ... waiting for slots 00000032
>> mvsch3: Timeout on slot 5
>> mvsch3: iec 02000000 sstat 00000123 serr 00000000 edma_s 000000e1
>> dma_c 20000708 dma_s 00000008 rs 000000f2 status 40
>> mvsch3:  ... waiting for slots 00000012
>> mvsch3: Timeout on slot 4
>> mvsch3: iec 02000000 sstat 00000123 serr 00000000 edma_s 000000e1
>> dma_c 20000708 dma_s 00000008 rs 000000f2 status 40
>> mvsch3:  ... waiting for slots 00000002
>> mvsch3: Timeout on slot 1
>> mvsch3: iec 02000000 sstat 00000123 serr 00000000 edma_s 000000e1
>> dma_c 20000708 dma_s 00000008 rs 000000f2 status 40
>> (ada3:mvsch3:0:0:0): READ_FPDMA_QUEUED. ACB: 60 00 95 e4 11 40 4d 00 00 01 00 00
>> (ada3:mvsch3:0:0:0): CAM status: Command timeout
>> (ada3:mvsch3:0:0:0): Retrying command
>> (ada3:mvsch3:0:0:0): READ_FPDMA_QUEUED. ACB: 60 00 f2 5f 00 40 21 00 00 01 00 00
>> (ada3:mvsch3:0:0:0): CAM status: Command timeout
>> (ada3:mvsch3:0:0:0): Retrying command
>> (ada3:mvsch3:0:0:0): READ_FPDMA_QUEUED. ACB: 60 00 f2 61 00 40 21 00 00 01 00 00
>> (ada3:mvsch3:0:0:0): CAM status: Command timeout
>> (ada3:mvsch3:0:0:0): Retrying command
>> (ada3:mvsch3:0:0:0): READ_FPDMA_QUEUED. ACB: 60 00 f2 63 00 40 21 00 00 01 00 00
>> (ada3:mvsch3:0:0:0): CAM status: Command timeout
>> (ada3:mvsch3:0:0:0): Retrying command
>> (ada3:mvsch3:0:0:0): READ_FPDMA_QUEUED. ACB: 60 00 f2 67 00 40 21 00 00 01 00 00
>> (ada3:mvsch3:0:0:0): CAM status: Command timeout
>> (ada3:mvsch3:0:0:0): Retrying command
>> ...
>>
>> -Alan
>>
>
> --
> Alexander Motin


More information about the freebsd-scsi mailing list