em(4) hang [Was: Re: igb(4) won't start with "igb0: Could not setup receive structures"]

Thu Mar 31 23:06:45 UTC 2011

Hi,

On Thu, Mar 31, 2011 at 6:28 PM, Jack Vogel <jfvogel at gmail.com> wrote:
> OK, but those are not something present in this data, that was what I'm
> asking.
>
> So, you have a hang for which we do not have a certain cause.  What does
> netstat -m show?
>
# netstat -m
3073/74927/78000 mbufs in use (current/cache/total)
3070/29698/32768/32768 mbuf clusters in use (current/cache/total/max)
0/383 mbuf+clusters out of packet secondary zone in use (current/cache)
0/12800/12800/12800 4k (page size) jumbo clusters in use
(current/cache/total/max)
0/0/0/6400 9k jumbo clusters in use (current/cache/total/max)
0/0/0/3200 16k jumbo clusters in use (current/cache/total/max)
6908K/129327K/136236K bytes allocated to network (current/cache/total)
0/1080/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
0/0/0 requests for jumbo clusters denied (4k/9k/16k)
0/7/6656 sfbufs in use (current/peak/max)
0 requests for sfbufs denied
0 requests for sfbufs delayed
0 requests for I/O initiated by sendfile
0 calls to protocol drain routines

Note that the mbuf allocation denial did not appended at once. It has
been progressively increasing by block of ~200 over the 5h of uptime
of the machine, until the current condition occurred.

I have previously been trying to simulate the depletion and the hang,
but the driver recovered. I assume the condition is met in
em_local_timer() to refresh the ring, I'd still need to check that.

 - Arnaud

> Jack
>
>
> On Thu, Mar 31, 2011 at 3:15 PM, Arnaud Lacombe <lacombar at gmail.com> wrote:
>>
>> Hi,
>>
>> On Thu, Mar 31, 2011 at 5:57 PM, Jack Vogel <jfvogel at gmail.com> wrote:
>> > So, what is the evidence that the driver is stuck here?
>> >
>> About 800 pps (mostly SYN) present wire but never ever seen on em0,
>> plus a couple of ARP reply, which still never hit em0, plus the
>> `missed_packets' count increasing by the same 800 pps in the last
>> hour. Is that enough ?
>>
>>  - Arnaud
>>
>> ps: I forgot to add that MAC address on the wire are fine.
>>
>> > I see that next_to_check != next_to_refresh, which is why the
>> > local timer won't schedule anything. OH, and I also realized there
>> > is a problem with local_timer anyway, it will run rxeof, but that won't
>> > help
>> > if you can't enter the loop, so I need to add some code at the top to
>> > call em_refresh_mbufs() when in this state.
>> >
>> > On this interrupt cause that you are focused upon, although its there in
>> > the
>> > design, I had talked with some of our most seasoned developers on both
>> > the Windows and Linux side of the house, and NO one has ever used this
>> > 'feature', because (and I'm quoting here) "there's no good use case for
>> > it".
>> > Meaning, there's always some simpler way of handling the issue.
>> >
>> > When you use MSIX you can't read causes btw, if you configured it, it
>> > would
>> > mean you'd just get into the regular RX handler, same as always, so why
>> > some special bother with this cause?
>> >
>> > On non-MSIX hardware there is just no particular reason to worry about
>> > the
>> > cause either, we can just handle the RX situation in the interrupt
>> > handler.
>> >
>> > Jack
>> >
>> >
>> > On Thu, Mar 31, 2011 at 2:09 PM, Arnaud Lacombe <lacombar at gmail.com>
>> > wrote:
>> >>
>> >> Hi Jack,
>> >>
>> >> On Thu, Mar 31, 2011 at 9:51 AM, Arnaud Lacombe <lacombar at gmail.com>
>> >> wrote:
>> >> > [...]
>> >> > I'll remove part of the changes I made to keep only
>> >> > `rx_forced_refill'
>> >> > and the associated sysctl, re-run the tests and come back with
>> >> > correct
>> >> > value, hopefully in a few hours.
>> >> >
>> >> Here it is:
>> >>
>> >> # sysctl dev.em.0.%desc
>> >> dev.em.0.%desc: Intel(R) PRO/1000 Network Connection 7.2.2
>> >>
>> >> # sysctl dev.em.0.mac_stats.missed_packets
>> >> dev.em.0.mac_stats.missed_packets: 917428
>> >>
>> >> # sysctl dev.em.0.debug=1
>> >> dev.em.0.debug: I-1nterface is RUNNING and INACTIVE
>> >> em0: hw tdh = 975, hw tdt = 975
>> >> em0: hw rdh = 884, hw rdt = 885
>> >> em0: Tx Queue Status = 0
>> >> em0: TX descriptors avail = 1024
>> >> em0: Tx Descriptors avail failure = 0
>> >> em0: RX discarded packets = 0
>> >> em0: RX Next to Check = 884
>> >> em0: RX Next to Refresh = 885
>> >>  -> -1
>> >>
>> >> So the taskqueue cannot be scheduled to run and the driver is stuck.
>> >>
>> >> > On Wed, Mar 30, 2011 at 2:22 PM, Jack Vogel <jfvogel at gmail.com>
>> >> > wrote:
>> >> >> Read the code in HEAD, em_local_timer() has a test of ALL the rx
>> >> >> queues
>> >> >> and
>> >> >> will schedule a task that refreshes mbufs if they are empty. This
>> >> >> has
>> >> >> exactly the
>> >> >> same effect as checking for some interrupt cause, a cause that is
>> >> >> not
>> >> >> available
>> >> >> when using MSIX on 82574, but this approach works for everything.
>> >> >>
>> >> Can you please point me to a reference datasheet (or errata), provided
>> >> by Intel, about the RX Overrun interrupt not being available with
>> >> MSI-X on the 82574 ?
>> >>
>> >> Currently, I only have access to [0], which precises the following:
>> >>
>> >> 7.4 Interrupts
>> >> 7.4.2 MSI-X Mode
>> >> [...]
>> >> The following configuration and parameters are involved:
>> >> • The IVAR.INT_Alloc[4:0] entries map two Tx queues, two Rx queues and
>> >> other
>> >> events to 5 interrupt vectors
>> >> • The ICR[24:20] bits reflect specific interrupt causes
>> >> • Five MSI-X interrupt vectors are provided (calculated based on four
>> >> vectors for
>> >> queues and one vector for other causes). The requested number of
>> >> vectors
>> >> is
>> >> loaded from the MSI_X_N fields in the EEPROM into the PCIe MSI-X
>> >> capability
>> >> structure of the function.
>> >>
>> >> 10.2.4.1 Interrupt Cause Read Register - ICR (0x000C0; RC/WC)
>> >> [...]
>> >>
>> >> about bit 24:
>> >>
>> >> Other Interrupt. Indicates one of the following interrupts was set:
>> >> • Link Status Change.
>> >> • Receiver Overrun.
>> >> • MDIO Access Complete.
>> >> • Small Receive Packet Detected.
>> >> • Receive ACK Frame Detected.
>> >> • Manageability Event Detected.
>> >>
>> >> Thanks in advance,
>> >>  - Arnaud
>> >>
>> >> [0]: ftp://download.intel.com/design/network/datashts/82574.pdf
>> >
>> >
>
>