Fwd: frequent timeouts with mvs(4) SATA controller, GELI, and ZFS
Alexander Motin
mav at FreeBSD.org
Sun Dec 11 09:44:29 UTC 2016
This controller uses Marvell proprietary API, and alike to most of their
products is not publicly documented. This family of chips also known
for long errata history, which is also not publicly documented. In
addition to that, this line of chips is discontinued for years since
Marvell switched to new line of AHCI compatible 6Gbps chips.
"iec 02000000" means device error reported by EDMA engine. It should be
properly handled, not causing timeouts, but it seems something went
wrong. Either chip forgot to generate the interrupt, or driver did
something wrong about it.
As workaround you may try to disable NCQ for those drives using
`camcontrol negotiate` and see what happen. May be that allow you to
see some real error reported by the drive or at least allow error recovery.
On 11.12.2016 02:03, Alan Somers wrote:
> I have an 11.0-RELEASE machine with a Via Nano CPU and a Marvell SATA
> 88SX7042 controller. I have a GELI-encrypted triple-mirror zpool with
> disks on that controller. But the number doesn't matter; I have the
> same problems even when only one disk is connected. Whenever I write
> to this pool, after a few GB of writes I get a timeout on one of the
> mvs(4) slots, followed shortly by timeouts on every disk on that
> controller. From this point until I reboot, no command sent to any
> disk on that controller will ever complete. CAM tries to reprobe the
> disks, fails, and their ada nodes disappear. This is repeatable.
> Does anybody have any ideas what's going on?
> Anybody know any dirt about this SATA controller?
>
> pciconf -lv
> ...
> atapci0 at pci0:0:15:0: class=0x01018f card=0xaa241106 chip=0x90011106 rev=0x00
> hdr=0x00
> vendor = 'VIA Technologies, Inc.'
> device = 'VX900 Serial ATA Controller'
> class = mass storage
> subclass = ATA
> mvs0 at pci0:1:0:0: class=0x010000 card=0x11ab11ab chip=0x704211ab rev=0x02
> hdr=0x00
> vendor = 'Marvell Technology Group Ltd.'
> device = '88SX7042 PCI-e 4-port SATA-II'
> class = mass storage
> subclass = SCSI
> ...
>
> dmesg
> ...
> mvsch3: Timeout on slot 7
> mvsch3: iec 02000000 sstat 00000123 serr 00000000 edma_s 000000e1
> dma_c 20000708 dma_s 00000008 rs 000000f2 status 40
> mvsch3: ... waiting for slots 00000072
> mvsch3: Timeout on slot 6
> mvsch3: iec 02000000 sstat 00000123 serr 00000000 edma_s 000000e1
> dma_c 20000708 dma_s 00000008 rs 000000f2 status 40
> mvsch3: ... waiting for slots 00000032
> mvsch3: Timeout on slot 5
> mvsch3: iec 02000000 sstat 00000123 serr 00000000 edma_s 000000e1
> dma_c 20000708 dma_s 00000008 rs 000000f2 status 40
> mvsch3: ... waiting for slots 00000012
> mvsch3: Timeout on slot 4
> mvsch3: iec 02000000 sstat 00000123 serr 00000000 edma_s 000000e1
> dma_c 20000708 dma_s 00000008 rs 000000f2 status 40
> mvsch3: ... waiting for slots 00000002
> mvsch3: Timeout on slot 1
> mvsch3: iec 02000000 sstat 00000123 serr 00000000 edma_s 000000e1
> dma_c 20000708 dma_s 00000008 rs 000000f2 status 40
> (ada3:mvsch3:0:0:0): READ_FPDMA_QUEUED. ACB: 60 00 95 e4 11 40 4d 00 00 01 00 00
> (ada3:mvsch3:0:0:0): CAM status: Command timeout
> (ada3:mvsch3:0:0:0): Retrying command
> (ada3:mvsch3:0:0:0): READ_FPDMA_QUEUED. ACB: 60 00 f2 5f 00 40 21 00 00 01 00 00
> (ada3:mvsch3:0:0:0): CAM status: Command timeout
> (ada3:mvsch3:0:0:0): Retrying command
> (ada3:mvsch3:0:0:0): READ_FPDMA_QUEUED. ACB: 60 00 f2 61 00 40 21 00 00 01 00 00
> (ada3:mvsch3:0:0:0): CAM status: Command timeout
> (ada3:mvsch3:0:0:0): Retrying command
> (ada3:mvsch3:0:0:0): READ_FPDMA_QUEUED. ACB: 60 00 f2 63 00 40 21 00 00 01 00 00
> (ada3:mvsch3:0:0:0): CAM status: Command timeout
> (ada3:mvsch3:0:0:0): Retrying command
> (ada3:mvsch3:0:0:0): READ_FPDMA_QUEUED. ACB: 60 00 f2 67 00 40 21 00 00 01 00 00
> (ada3:mvsch3:0:0:0): CAM status: Command timeout
> (ada3:mvsch3:0:0:0): Retrying command
> ...
>
> -Alan
>
--
Alexander Motin
More information about the freebsd-scsi
mailing list