em(4) hang [Was: Re: igb(4) won't start with "igb0: Could not
setup receive structures"]
Arnaud Lacombe
lacombar at gmail.com
Thu Mar 31 22:15:34 UTC 2011
Hi,
On Thu, Mar 31, 2011 at 5:57 PM, Jack Vogel <jfvogel at gmail.com> wrote:
> So, what is the evidence that the driver is stuck here?
>
About 800 pps (mostly SYN) present wire but never ever seen on em0,
plus a couple of ARP reply, which still never hit em0, plus the
`missed_packets' count increasing by the same 800 pps in the last
hour. Is that enough ?
- Arnaud
ps: I forgot to add that MAC address on the wire are fine.
> I see that next_to_check != next_to_refresh, which is why the
> local timer won't schedule anything. OH, and I also realized there
> is a problem with local_timer anyway, it will run rxeof, but that won't help
> if you can't enter the loop, so I need to add some code at the top to
> call em_refresh_mbufs() when in this state.
>
> On this interrupt cause that you are focused upon, although its there in the
> design, I had talked with some of our most seasoned developers on both
> the Windows and Linux side of the house, and NO one has ever used this
> 'feature', because (and I'm quoting here) "there's no good use case for it".
> Meaning, there's always some simpler way of handling the issue.
>
> When you use MSIX you can't read causes btw, if you configured it, it would
> mean you'd just get into the regular RX handler, same as always, so why
> some special bother with this cause?
>
> On non-MSIX hardware there is just no particular reason to worry about the
> cause either, we can just handle the RX situation in the interrupt handler.
>
> Jack
>
>
> On Thu, Mar 31, 2011 at 2:09 PM, Arnaud Lacombe <lacombar at gmail.com> wrote:
>>
>> Hi Jack,
>>
>> On Thu, Mar 31, 2011 at 9:51 AM, Arnaud Lacombe <lacombar at gmail.com>
>> wrote:
>> > [...]
>> > I'll remove part of the changes I made to keep only `rx_forced_refill'
>> > and the associated sysctl, re-run the tests and come back with correct
>> > value, hopefully in a few hours.
>> >
>> Here it is:
>>
>> # sysctl dev.em.0.%desc
>> dev.em.0.%desc: Intel(R) PRO/1000 Network Connection 7.2.2
>>
>> # sysctl dev.em.0.mac_stats.missed_packets
>> dev.em.0.mac_stats.missed_packets: 917428
>>
>> # sysctl dev.em.0.debug=1
>> dev.em.0.debug: I-1nterface is RUNNING and INACTIVE
>> em0: hw tdh = 975, hw tdt = 975
>> em0: hw rdh = 884, hw rdt = 885
>> em0: Tx Queue Status = 0
>> em0: TX descriptors avail = 1024
>> em0: Tx Descriptors avail failure = 0
>> em0: RX discarded packets = 0
>> em0: RX Next to Check = 884
>> em0: RX Next to Refresh = 885
>> -> -1
>>
>> So the taskqueue cannot be scheduled to run and the driver is stuck.
>>
>> > On Wed, Mar 30, 2011 at 2:22 PM, Jack Vogel <jfvogel at gmail.com> wrote:
>> >> Read the code in HEAD, em_local_timer() has a test of ALL the rx queues
>> >> and
>> >> will schedule a task that refreshes mbufs if they are empty. This has
>> >> exactly the
>> >> same effect as checking for some interrupt cause, a cause that is not
>> >> available
>> >> when using MSIX on 82574, but this approach works for everything.
>> >>
>> Can you please point me to a reference datasheet (or errata), provided
>> by Intel, about the RX Overrun interrupt not being available with
>> MSI-X on the 82574 ?
>>
>> Currently, I only have access to [0], which precises the following:
>>
>> 7.4 Interrupts
>> 7.4.2 MSI-X Mode
>> [...]
>> The following configuration and parameters are involved:
>> • The IVAR.INT_Alloc[4:0] entries map two Tx queues, two Rx queues and
>> other
>> events to 5 interrupt vectors
>> • The ICR[24:20] bits reflect specific interrupt causes
>> • Five MSI-X interrupt vectors are provided (calculated based on four
>> vectors for
>> queues and one vector for other causes). The requested number of vectors
>> is
>> loaded from the MSI_X_N fields in the EEPROM into the PCIe MSI-X
>> capability
>> structure of the function.
>>
>> 10.2.4.1 Interrupt Cause Read Register - ICR (0x000C0; RC/WC)
>> [...]
>>
>> about bit 24:
>>
>> Other Interrupt. Indicates one of the following interrupts was set:
>> • Link Status Change.
>> • Receiver Overrun.
>> • MDIO Access Complete.
>> • Small Receive Packet Detected.
>> • Receive ACK Frame Detected.
>> • Manageability Event Detected.
>>
>> Thanks in advance,
>> - Arnaud
>>
>> [0]: ftp://download.intel.com/design/network/datashts/82574.pdf
>
>
More information about the freebsd-net
mailing list