bce kernel page faults and NMIs (was: Strange reboot since 9.1)

Sebastian Kuzminsky S.Kuzminsky at F5.com
Mon Jun 3 20:59:47 UTC 2013


Howdy folks, this email is a follow-on to a 3-month-old thread about kernel page faults from the bce driver[0].

0:  http://lists.freebsd.org/pipermail/freebsd-stable/2013-March/072713.html

Sorry to revive such an old thread, but a couple of bits of new information has come to light here that may be useful for others.

The header splitting suggestion that Marius Strobl  made[1] did fix the kernel page fault rooted in bce_intr() that we were seeing (and that other folks reported in the original thread).  I'm no bce expert, but it looks to me like the bce driver does not apply the same flow control to its page queue as it does to its receive queue, maybe that's related to the problem?

1:  http://lists.freebsd.org/pipermail/freebsd-stable/2013-March/072766.html

After disabling bce header splitting we stopped getting kernel page faults, but we still had problems with this NIC (Broadcom NetXtreme II BCM5716 Gigabit Ethernet) producing frequent PCI errors and occasional NMIs.

I found this thread[2] that suggests that the NIC firmware version may be relevant to the NMI problem.  The Red Hat people are reporting that firmware version 6.0.1 is bad and 6.4.5 is good; 9.1 ships with 6.0.17, so who knows what that means...  We ended up reverting to the bce driver from FreeBSD 7 and that fixed our NMI problems.  (The bce driver from FreeBSD 7 also has header splitting disabled by default: Bonus!)

2:  https://bugzilla.redhat.com/show_bug.cgi?id=693542


-- 
Sebastian Kuzminsky


More information about the freebsd-stable mailing list