Panic during kernel booting on HP Proliant DL180G6 and latest
STABLE
Jeremy Chadwick
freebsd at jdc.parodius.com
Thu Sep 22 04:19:27 UTC 2011
On Wed, Sep 21, 2011 at 08:26:46PM -0700, Craig Leres wrote:
> I have a lot of supermicro motherboards and the newest ones have igb
> chipsets; they've been quite a headache with respect to FreeBSD 8. I'm
> running 8.2-RELEASE but have upgraded parts of my kernel to 8-RELENG (as
> of a few months ago). Some of them work ok while others panic on bootup.
> Upgrading to newer versions of the intel igb code fixes some but breaks
> others. It's been frustrating.
>
> While working on this today, I saw two different kernel panics:
>
> Could not setup receive structures
> m_getzone: m_getjcl: invalid cluster type
>
> I tried John Baldwin's patch but got the "invalid cluster type" panic so
> I backed it out.
>
> Later I figured out that either turning off hw.igb.enable_msix
> (loader.conf) or raising kern.ipc.nmbclusters to 131072 (sysctl.conf)
> and setting hw.igb.num_queues to 4 (loader.conf) would avoid the
> "receive structures" panic but either way I was seeing the "invalid
> cluster type" panic.
>
> Looking m_getjcl(), I suspected the passed size to be 0; some debugging
> confirmed this. Looks like a race here where a receive interrupt comes
> in before adapter->rx_mbuf_sz has been initialized.
>
> Attached is the hack I added to avoid the panic when booting. The idea
> is to pretend m_getjcl() failed to allocate a cluster rather than to go
> down in flames.
>
> Craig
> Index: if_igb.c
> ===================================================================
> --- if_igb.c (revision 31)
> +++ if_igb.c (working copy)
> @@ -3695,6 +3695,11 @@
> htole64(hseg[0].ds_addr);
> no_split:
> if (rxbuf->m_pack == NULL) {
> + if (adapter->rx_mbuf_sz == 0) {
> + printf("igb_refresh_mbufs: "
> + "avoid m_getjcl() panic\n");
> + goto update;
> + }
> mp = m_getjcl(M_DONTWAIT, MT_DATA,
> M_PKTHDR, adapter->rx_mbuf_sz);
> if (mp == NULL)
> @@ -3912,6 +3917,12 @@
>
> skip_head:
> /* Now the payload cluster */
> + if (adapter->rx_mbuf_sz == 0) {
> + printf("igb_setup_receive_ring: "
> + "avoid m_getjcl() panic\n");
> + error = ENOBUFS;
> + goto fail;
> + }
> rxbuf->m_pack = m_getjcl(M_DONTWAIT, MT_DATA,
> M_PKTHDR, adapter->rx_mbuf_sz);
> if (rxbuf->m_pack == NULL) {
The fact you have this happening on multiple systems is uncomfortable.
It makes me uncomfortable because we use Supermicro hardware
exclusively.
Your Email contains no reference ID or in-reply-to headers so it appears
as a new thread. As such I'll point readers to the thread which spans
over months:
http://lists.freebsd.org/pipermail/freebsd-stable/2011-May/062596.html
http://lists.freebsd.org/pipermail/freebsd-stable/2011-June/062949.html
http://lists.freebsd.org/pipermail/freebsd-stable/2011-September/063867.html
Also re-CC'ing Jack Vogel.
Chris, I am under the impression that to get proper visibility and
attention in this matter, you're probably going to need to set up serial
console (both BIOS-level and bootloader-level) for remote debugging
capability. Jack, John, or someone familiar with kernel debugging is
probably going to need to get access to a machine which is experiencing
this problem so they can figure out what's going on.
The tricky part here is that you're going to need to have a custom
kernel built that includes numerous debugging options. PXE booting is
probably the easiest method. Remember you don't need filesystems on the
system, just a kernel that boots/loads and will drop to ddb> when the
panic happens.
--
| Jeremy Chadwick jdc at parodius.com |
| Parodius Networking http://www.parodius.com/ |
| UNIX Systems Administrator Mountain View, CA, US |
| Making life hard for others since 1977. PGP 4BD6C0CB |
More information about the freebsd-stable
mailing list