kern/172113: [panic] [e1000] [patch] 9.1-RC1/amd64 panices in igb(4): m_getjcl: invalid cluster type
Jack Vogel
jfvogel at gmail.com
Mon Jan 21 20:30:03 UTC 2013
The following reply was made to PR kern/172113; it has been noted by GNATS.
From: Jack Vogel <jfvogel at gmail.com>
To: George Neville-Neil <gnn at freebsd.org>
Cc: John Baldwin <jhb at freebsd.org>, bug-followup at freebsd.org, egrosbein at rdtc.ru,
jfv at freebsd.org
Subject: Re: kern/172113: [panic] [e1000] [patch] 9.1-RC1/amd64 panices in
igb(4): m_getjcl: invalid cluster type
Date: Mon, 21 Jan 2013 12:28:40 -0800
--f46d04339ce484676004d3d24e43
Content-Type: text/plain; charset=ISO-8859-1
Well, do you have a more complete designation of the motherboard? We can
look into it, although if the one check stops the problem it may be a low
priority.
Jack
On Mon, Jan 21, 2013 at 11:25 AM, George Neville-Neil <gnn at freebsd.org>wrote:
>
> On Jan 19, 2013, at 23:26 , John Baldwin <jhb at FreeBSD.org> wrote:
>
> > I was able to finally reproduce this panic today. It seems to require
> > a server configured for PXE but that receives no DHCP reply (and
> > possibly with the requisite SuperMicro X8 board). I was able to
> > prevent the panic with a subset of the referenced patch by only adding
> > the 'if_drv_flags & IFF_DRV_RUNNING' check to the start of
> > igb_msix_que(). The rest of the patch was unnecessary. I also added
> > some debugging to print out the ICR, EICR, IMS, and EIMS registers in
> > this case. It does look like the hardware is sending an interrupt that
> > is not enabled in the interrupt mask (specifically LSC). In fact, the
> > 82576 datasheet specifically mentions masking LSC until initialization
> > is complete to avoid spurious interrupts during boot and AFAICT igb(4)
> > does this since e1000_reset_hw() clears the interrupt mask via writes
> > to IMC and doesn't re-enable interrupts until igb_init_locked() is
> > invoked via 'ifconfig up'. Here is my debug output:
> >
> > SMP: AP CPU #6 Launched!
> > SMP: AP CPU #4 Launched!
> > stray irq0
> > igb0: interrupt on que 0: icr 0x1000004 eicr 0
> > ims 0 eims 0x80000000
> >
> > Hmmm. Nothing clears EIMS. After some more debugging, I determined
> > that e1000_reset_hw() always turns this bit in EIMS on, even if it is
> > off before e1000_reset_hw() is called(!). I added explicit calls to
> > igb_disable_intr() to clear EIMS after each call to e1000_reset_hw().
> > This removes the 'stray irq0', but I still get a spurious interrupt
> > during boot (albeit with eims 0). I can use the IFF_DRV_RUNNING hack
> > for now, but I think the real fix is something else.
> >
>
> I think Jack will have to chime in on this one. Do you think it's all SM
> X8 boards
> or just the one we happen to have? I wonder if Jack or Jeffrey (the
> testing guy he works
> with) have access to the right board.
>
> Best,
> George
>
>
>
--f46d04339ce484676004d3d24e43
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Well, do you have a more complete designation of the motherboard? We can<br=
>look into it, although if the one check stops the problem it may be a low =
priority.<br><br>Jack<br><br><br><div class=3D"gmail_quote">On Mon, Jan 21,=
2013 at 11:25 AM, George Neville-Neil <span dir=3D"ltr"><<a href=3D"mai=
lto:gnn at freebsd.org" target=3D"_blank">gnn at freebsd.org</a>></span> wrote=
:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><br>
On Jan 19, 2013, at 23:26 , John Baldwin <jhb at FreeBSD.org> wrote:<br>
<br>
> I was able to finally reproduce this panic today. =A0It seems to requi=
re<br>
> a server configured for PXE but that receives no DHCP reply (and<br>
> possibly with the requisite SuperMicro X8 board). =A0I was able to<br>
> prevent the panic with a subset of the referenced patch by only adding=
<br>
> the 'if_drv_flags & IFF_DRV_RUNNING' check to the start of=
<br>
> igb_msix_que(). =A0The rest of the patch was unnecessary. =A0I also ad=
ded<br>
> some debugging to print out the ICR, EICR, IMS, and EIMS registers in<=
br>
> this case. =A0It does look like the hardware is sending an interrupt t=
hat<br>
> is not enabled in the interrupt mask (specifically LSC). =A0In fact, t=
he<br>
> 82576 datasheet specifically mentions masking LSC until initialization=
<br>
> is complete to avoid spurious interrupts during boot and AFAICT igb(4)=
<br>
> does this since e1000_reset_hw() clears the interrupt mask via writes<=
br>
> to IMC and doesn't re-enable interrupts until igb_init_locked() is=
<br>
> invoked via 'ifconfig up'. =A0Here is my debug output:<br>
><br>
> SMP: AP CPU #6 Launched!<br>
> SMP: AP CPU #4 Launched!<br>
> stray irq0<br>
> igb0: interrupt on que 0: icr 0x1000004 eicr 0<br>
> =A0 =A0 ims 0 eims 0x80000000<br>
><br>
> Hmmm. =A0 Nothing clears EIMS. =A0After some more debugging, I determi=
ned<br>
> that e1000_reset_hw() always turns this bit in EIMS on, even if it is<=
br>
> off before e1000_reset_hw() is called(!). =A0I added explicit calls to=
<br>
> igb_disable_intr() to clear EIMS after each call to e1000_reset_hw().<=
br>
> This removes the 'stray irq0', but I still get a spurious inte=
rrupt<br>
> during boot (albeit with eims 0). =A0I can use the IFF_DRV_RUNNING hac=
k<br>
> for now, but I think the real fix is something else.<br>
><br>
<br>
I think Jack will have to chime in on this one. =A0Do you think it's al=
l SM X8 boards<br>
or just the one we happen to have? =A0I wonder if Jack or Jeffrey (the test=
ing guy he works<br>
with) have access to the right board.<br>
<br>
Best,<br>
George<br>
<br>
<br>
</blockquote></div><br>
--f46d04339ce484676004d3d24e43--
More information about the freebsd-net
mailing list