82574L hangs (with r233708 e1000 driver).

Konstantin Belousov kostikbel at gmail.com
Thu Apr 12 18:38:54 UTC 2012


On Mon, Apr 09, 2012 at 12:19:39PM -0400, John Baldwin wrote:
> On Sunday, April 08, 2012 1:11:25 am Konstantin Belousov wrote:
> > On Sat, Apr 07, 2012 at 04:22:07PM -0700, Jack Vogel wrote:
> > > Make sure you have any firmware up to the latest available, if that doesn't
> > > help
> > > let me know and I'll check internally to see if there are any outstanding
> > > issues
> > > in shared code,  that will be after the weekend.
> > 
> > I had BIOS rev. 151, after you hint I found rev. 154 on the site.
> > Now BIOS reports itself as MTCDT10N.86A.0154.2012.0323.1601,
> > March 23.
> > 
> > Unfortunately, upgrade did not changed anything in regard of hanging
> > interface.
> 
> Does reverting 233708 make any difference?  Have you tried futzing around with
> kgdb when it is hung to see what state the device is in (software state at
> least)?
It does, in a sense that without r233708 the interface becomes stuck
almost immediately. I just upgraded to the e1000 at r234154, which does not
change much.

I fiddled with the adapter state after the hang in kgdb more, and I
noted something interesting. Apparently, tx works. When I ping the remote
host from my suffering atom machine, remote host sees the packet. Also
remote machine sees some udp traffic originating from the tom, like
ntp queries.

And, on receive, the atom board does receive interrupts, em0:rx 0 counter
in vmstat -i increases. Even more fun, the sysctl dev.em.0.debug
shows increasing hw rdh (as I understand, this is hardware 'last
received' packet pointer for rx ring). So I looked at the packet
descriptor at hw rdt index, and there I see
(kgdb) p/x ((struct adapter *)0xffffff80010e4000)->rx_rings->rx_base[78]
$11 = {buffer_addr = 0x12a128800, length = 0x5ea, csum = 0x3c2b, status = 0x0, 
  errors = 0x0, special = 0x0}

Apparently, the Descriptor Done bit is clear, so the em_rxeof() function
breaks from the loop, not consuming the current packet. Also, it returns
false due to DD bit clear. This prevents em_msix_rx() from scheduling
taskqueue for processing. So apparent cause for the hang is missing
DD bit in descriptor.

I am not sure isn't all this is obvious for anybody who knows em
internals, and were to go from there.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-net/attachments/20120412/d484760e/attachment.pgp


More information about the freebsd-net mailing list