Problem with a LSILogic SAS/SATA adapter on 8.2-STABLE/ZFSv28
Stephane LAPIE
stephane.lapie at darkbsd.org
Sun Jun 19 09:11:44 UTC 2011
On 06/18/2011 11:45 PM, Jeremy Chadwick wrote:
> For readers, the NMI and RAM parity error message in question is
> shown here:
>
> http://www.darkbsd.org/~darksoul/kernel-panic-mpt2.txt
>
> But is difficult to decode due to the well-established problem with the
> FreeBSD kernel interspersing text output. (I imagine this gets worse
> the more cores you have on your system, but that's not relevant to this
> discussion)
Nothing a quick grep on the source tree couldn't fix, but yeah, annoying :)
> Anyway, to expand on the "RAM parity error" and NMI message: this
> information I'm going to give you isn't specific to the LSI controller;
> it's a general piece of information. I've talked about this in the
> past. Please read it and focus on the SERR/PERR and NMI details:
>
> http://lists.freebsd.org/pipermail/freebsd-fs/2011-March/010938.html
I see. Thanks for the extra bit of info.
> If you want to rule out actual system RAM issues, I would recommend
> running memtest86 for about 30 minutes, and then memtest86+ for the same
> amount of time. This might sound crazy ("why can't I just run one?!"),
> but you need to review the ChangeLog for memtest86 to see why. Their
> support for detecting corrected ECC errors was removed with 4.0, but in
> 4.0 they added multi-CPU support (which is good to have in this
> situation), while memtest86 may still have support for ECC.
>
> Neither of these utilities are as excellent as a hardware RAM tester
> (which does cool things like sending extreme amounts of voltage through
> each DRAM module, looks for soft and hard errors, etc.), but those are
> expensive. Usually system memory problems will show up in memtest86/86+
> pretty quickly though.
I am currently rebuilding a pool, it will have to wait until this is
done, and I will do it just to be on the safe side, but I think I
actually nailed it down to the controller.
> All that said: it may be possible that the NMIs you're seeing aren't
> being induced by system RAM issues at all, but somehow are being
> generated or caused by the LSI controller. I wasn't under the
> impression that a PCIe MSI and/or MSI-X generated an NMI, but I could be
> completely wrong.
Kernel panic problems would pop at random occurences (probably stress
induced, and the common point in each one of these was that one
processor was handling an interrupt for mpt0), sometimes every 10
minutes, sometimes every hour(s).
So, I put back another controller :
mvs0: <Marvell 88SX6081 SATA controller> port 0x3000-0x30ff mem
0xdf200000-0xdf2fffff irq 24 at device 1.0 on pci7
mvs0: Gen-II, 8 3Gbps ports, Port Multiplier supported
mvs0: [ITHREAD]
which did not exhibit this behavior.
By the way, for reference, the controller I had been using is a PCI-X
one, using a SAS-1068R chipset.
Here is a picture of the controller in case anyone is familiar with it :
http://www.darkbsd.org/~darksoul/fujitsu-siemens-lsi-sas1068.JPG
This is a Fujitsu OEM board with a LSI chip,
so I guess it *might* have some firmware quirks or something, making it
unfit for FreeBSD.
> P.S. -- In the future, try to avoid cross-posting. :-)
Sorry about that. m(_ _)m
--
Stephane LAPIE, EPITA SRS, Promo 2005
"Even when they have digital readouts, I can't understand them."
--MegaTokyo
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 262 bytes
Desc: OpenPGP digital signature
Url : http://lists.freebsd.org/pipermail/freebsd-hardware/attachments/20110619/250e403d/signature.pgp
More information about the freebsd-hardware
mailing list