Disk controller heizenbug.

Chris H bsd-lists at bsdforge.com
Thu Mar 2 06:06:16 UTC 2017


On Thu, 2 Mar 2017 00:16:58 -0500 Zaphod Beeblebrox <zbeeble at gmail.com> wrote

> I have a disk controller.  I works in a modern AMD motherboard at home
> (9590 processor),
Nice thing about these processors, is the ability to cook your
meals on them, too. :-)

> but when connected to a sunfire 4140 (opteron 2345 based
> machine vintage 2008-ish) the disks spontaneously detach by just doing a
> "zfs import"
> 
> The board has it's own mounting for the flash disks (two of them) and
> probes as:
> 
> ahci0: <Marvell 88SE9230 AHCI SATA controller> port
> 0x8c00-0x8c07,0x8880-0x8883,0x8800-0x8807,0x8480-0x8483,0x8400-0x841f mem
> 0xdfbff800-0xdfbfffff irq 16 at device 0.0 numa-domain 0 on pci3
> 
> The disks show up as:
> 
> ada0 at ahcich0 bus 0 scbus6 target 0 lun 0
> ada0: <Samsung SSD 850 EVO mSATA 250GB EMT41B6Q> ACS-2 ATA SATA 3.x device
> ada0: Serial Number S248NXAH112465B
> ada0: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 512bytes)
> ada0: Command Queueing enabled
> ada0: 238475MB (488397168 512 byte sectors)
> ada0: quirks=0x3<4K,NCQ_TRIM_BROKEN>
> 
> Under heavy bonnie++, they work in the AMD 9590 system.  On the opteron
> machine, the following occurs:
> 
> ahcich1: Timeout on slot 11 port 0
> ahcich1: is ffffffff cs ffffffff ss ffffffff rs 00000800 tfd ffffffff serr
> ffffffff cmd ffffffff
> (ada1:ahcich1:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 90 e0 20 a0 40 17 00 00
> 00 00 00
> (ada1:ahcich1:0:0:0): CAM status: Command timeout
> (ada1:ahcich1:0:0:0): Retrying command
> ahcich1: stopping AHCI engine failed
> ahcich0: ada1 at ahcich1 bus 0 scbus7 target 0 lun 0
> Timeout on slot 31 port 0
> ada1: ahcich0: <Samsung SSD 850 EVO mSATA 250GB EMT41B6Q>is ffffffff cs
> ffffffff ss ffffffff rs 80000000 tfd ffffffff serr ffffffff cmd ffffffff
>  s/n S248NXAH112471L detached
> (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 90 e0 20 a0 40 17 00 00
> 00 00 00
> (ada0:ahcich0:0:0:0): CAM status: Command timeout
> (ada0:ahcich0:0:0:0): Retrying command
> ahcich0: stopping AHCI engine failed
> ada0 at ahcich0 bus 0 scbus6 target 0 lun 0
> ada0: <Samsung SSD 850 EVO mSATA 250GB EMT41B6Q> s/n S248NXAH112465B
> detached
> [2:43:343]root at yak:/usr/ports/net-mgmt/net-snmp> less /var/run/dmesg.boot
> [2:44:344]root at yak:/usr/ports/net-mgmt/net-snmp> dmesg
> pid 78200 (httpd), uid 80: exited on signal 11
> ahcich1: Timeout on slot 11 port 0
> ahcich1: is ffffffff cs ffffffff ss ffffffff rs 00000800 tfd ffffffff serr
> ffffffff cmd ffffffff
> (ada1:ahcich1:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 90 e0 20 a0 40 17 00 00
> 00 00 00
> (ada1:ahcich1:0:0:0): CAM status: Command timeout
> (ada1:ahcich1:0:0:0): Retrying command
> ahcich1: stopping AHCI engine failed
> ahcich0: ada1 at ahcich1 bus 0 scbus7 target 0 lun 0
> Timeout on slot 31 port 0
> ada1: ahcich0: <Samsung SSD 850 EVO mSATA 250GB EMT41B6Q>is ffffffff cs
> ffffffff ss ffffffff rs 80000000 tfd ffffffff serr ffffffff cmd ffffffff
>  s/n S248NXAH112471L detached
> (ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 90 e0 20 a0 40 17 00 00
> 00 00 00
> (ada0:ahcich0:0:0:0): CAM status: Command timeout
> (ada0:ahcich0:0:0:0): Retrying command
> ahcich0: stopping AHCI engine failed
> ada0 at ahcich0 bus 0 scbus6 target 0 lun 0
> ada0: <Samsung SSD 850 EVO mSATA 250GB EMT41B6Q> s/n S248NXAH112465B
> detached
> 
> I'm posting here to hackers because this seems to violate layers --- on the
> AMD machine ... it runs fine... even under load.  The SATA bus is local to
> the card (and so travels with it to the server), yet the error looks like a
> SATA BUS or drive error.
> 
> What gives?
I may be misunderstanding your question. But this smells
like a BUS timing issue. eg; maybe your sunfire isn't quite
synced -- BIOS settings for bus, ram && cpu?

I have the same issue on one of my boards. In my case, I
OC'd the CPU ~500Mhz over spec.

Just thought I'd mention it.

--Chris

> _______________________________________________
> freebsd-hackers at freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe at freebsd.org"




More information about the freebsd-hackers mailing list