[Bug 257670] RAS CONTROLLER: Fatal unrecoverable error detected with SAS3008
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 257670] mpr(4): SAS3008 PCI-Express Fusion-MPT SAS-3: Fatal unrecoverable error detected with : mpr0: IOC Fault 0x4000265d, Resetting (LOR: CAM device lock (CAM device lock, sleep mutex) @ /usr/src/sys/cam/cam_xpt.c)"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Sat, 07 Aug 2021 07:21:34 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=257670 Bug ID: 257670 Summary: RAS CONTROLLER: Fatal unrecoverable error detected with SAS3008 Product: Base System Version: CURRENT Hardware: arm64 OS: Any Status: New Severity: Affects Some People Priority: --- Component: arm Assignee: freebsd-arm@FreeBSD.org Reporter: daniel@morante.net Attachment #227004 text/plain mime type: Created attachment 227004 --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=227004&action=edit capture of boot via serial I am testing FreeBSD-14.0-CURRENT-arm64-aarch64-20210805-f3a3b061216-248478 on a Cavium ThunderX2 (Gigabyte R281-T91). This system has an onboard SAS3008 PCI-Express Fusion-MPT SAS-3 controller. ``` mpr0@pci0:14:0:0: class=0x010700 rev=0x02 hdr=0x00 vendor=0x1000 device=0x0097 subvendor=0x1458 subdevice=0x3008 vendor = 'Broadcom / LSI' device = 'SAS3008 PCI-Express Fusion-MPT SAS-3' class = mass storage subclass = SAS ``` I load the `mpr` driver by having `mpr_load="YES"` in `/boot/loader.conf`. So far so good except for the weird messages in dmesg. (see attachment) There are currently 8 HDD's attached to it and I setup 3 ZFS pools. This goes well until I finally start to put some load on them. The system kernel panics and halts with the following in dmesg: ``` mpr0: IOC Fault 0x4000265d, Resetting mpr0: Reinitializing controller ... RAS CONTROLLER: Fatal unrecoverable error detected ``` This is not to say the problem is with ZFS. I suspect the mpr driver is just unstable. The system can no longer boot into multi user mode. It kernel panics with the same error as soon as it tries to start ZFS. ``` mountroot: waiting for device /dev/nda0p2... WARNING: / was not properly dismounted Dual Console: Video Primary, Serial Secondary witness_lock_list_get: witness exhausted ZFS filesystem version: 5 ZFS storage pool version: features support (5000) RAS CONTROLLER: Fatal unrecoverable error detected *** NBU Error *** ... ``` In order to get a functional system I disable ZFS in `/etc/rc.conf` while in single user mode. Now back in multi user mode I can do a `service zfs onestart` and try to import one of the pools. The system then kernel panics again. I detail the full specs of this system in bug #254651 (where I have a problem with the onboard SATA controllers) and in my forum post at https://forums.freebsd.org/threads/aarch64-trouble-with-cn99xx-ahci-and-fastlinq-ql41000-controllers.79556/ (where I explain the lack of a driver for the onboard Ethernet). Also, for some weird reason I can no longer boot 13.0-RELEASE on this system. It panics with "panic: NVME polled command failed to complete within 10s". I think it doesn't like the add-on PCIe NVME. However when it was working (prior to adding in the NVME) the SAS controller was just as unstable. Seeing how most of the hardware is still very new, I don't expect FreeBSD (especcially arm64) to support it. I'd like to help anyway that I can should someone be interested. The system has an IPMI and I'd be willing to offer remote access to it for as long as it's required via VPN (if that's a thing that's normally done) on a dedicated network with any other required resources). -- You are receiving this mail because: You are the assignee for the bug.