igb driver RX (was TX) hangs when out of mbuf clusters
Michael Tüxen
Michael.Tuexen at lurchi.franken.de
Tue Feb 8 10:44:38 UTC 2011
On Feb 8, 2011, at 4:29 AM, Karim Fodil-Lemelin wrote:
> 2011/2/7 Pyun YongHyeon <pyunyh at gmail.com>
>
>> On Mon, Feb 07, 2011 at 09:21:45PM -0500, Karim Fodil-Lemelin wrote:
>>> 2011/2/7 Pyun YongHyeon <pyunyh at gmail.com>
>>>
>>>> On Mon, Feb 07, 2011 at 05:33:47PM -0500, Karim Fodil-Lemelin wrote:
>>>>> Subject: Re: igb driver tx hangs when out of mbuf clusters
>>>>>
>>>>>> To: Lev Serebryakov <lev at serebryakov.spb.ru>
>>>>>> Cc: freebsd-net at freebsd.org
>>>>>>
>>>>>>
>>>>>> 2011/2/7 Lev Serebryakov <lev at serebryakov.spb.ru>
>>>>>>
>>>>>> Hello, Karim.
>>>>>>> You wrote 7 февраля 2011 г., 19:58:04:
>>>>>>>
>>>>>>>
>>>>>>>> The issue is with the igb driver from 7.4 RC3 r218406. If the
>> driver
>>>>>>> runs
>>>>>>>> out of mbuf clusters it simply stops receiving even after the
>>>> clusters
>>>>>>> have
>>>>>>>> been freed.
>>>>>>> It looks like my problems with em0 (see thread "em0 hangs
>> without
>>>>>>> any messages like "Watchdog timeout", only down/up reset it.")...
>>>>>>> Codebase for em and igb is somewhat common...
>>>>>>>
>>>>>>> --
>>>>>>> // Black Lion AKA Lev Serebryakov <lev at serebryakov.spb.ru>
>>>>>>>
>>>>>>> I agree.
>>>>>>
>>>>>> Do you get missed packets in mac_stats (sysctl dev.em | grep
>> missed)?
>>>>>>
>>>>>> I might not have mentioned but I can also 'fix' the problem by
>> doing
>>>>>> ifconfig igb0 down/up.
>>>>>>
>>>>>> I will try using POLLING to 'automatize' the reset as you mentioned
>> in
>>>> your
>>>>>> thread.
>>>>>>
>>>>>> Karim.
>>>>>>
>>>>>>
>>>>> Follow up on tests with POLLING: The problem is still occurring
>> although
>>>> it
>>>>> takes more time ... Outputs of sysctl dev.igb0 and netstat -m will
>>>> follow:
>>>>>
>>>>> 9219/99426/108645 mbufs in use (current/cache/total)
>>>>> 9217/90783/100000/100000 mbuf clusters in use
>> (current/cache/total/max)
>>>>
>>>> Do you see network processes are stuck in keglim state? If you see
>>>> that I think that's not trivial to solve. You wouldn't even kill
>>>> that process if it is under keglim state unless some more mbuf
>>>> clusters are freed from other places.
>>>>
>>>
>>> No keglim state, here is a snapshot of top -SH while the problem is
>>> happening:
>>>
>>> 12 root 171 ki31 0K 8K CPU5 5 19:27 100.00% idle:
>>> cpu5
>>> 10 root 171 ki31 0K 8K CPU7 7 19:26 100.00% idle:
>>> cpu7
>>> 14 root 171 ki31 0K 8K CPU3 3 19:25 100.00% idle:
>>> cpu3
>>> 11 root 171 ki31 0K 8K CPU6 6 19:25 100.00% idle:
>>> cpu6
>>> 13 root 171 ki31 0K 8K CPU4 4 19:24 100.00% idle:
>>> cpu4
>>> 15 root 171 ki31 0K 8K CPU2 2 19:22 100.00% idle:
>>> cpu2
>>> 16 root 171 ki31 0K 8K CPU1 1 19:18 100.00% idle:
>>> cpu1
>>> 17 root 171 ki31 0K 8K RUN 0 19:12 100.00% idle:
>>> cpu0
>>> 18 root -32 - 0K 8K WAIT 6 0:04 0.10% swi4:
>>> clock s
>>> 20 root -44 - 0K 8K WAIT 4 0:08 0.00% swi1:
>> net
>>> 29 root -68 - 0K 8K - 0 0:02 0.00% igb0
>> que
>>> 35 root -68 - 0K 8K - 2 0:02 0.00% em1
>> taskq
>>> 28 root -68 - 0K 8K WAIT 5 0:01 0.00% irq256:
>>> igb0
>>>
>>> keep in mind that num_queues has been forced to 1.
>>>
>>>
>>>>
>>>> I think both igb(4) and em(4) pass received frame to upper stack
>>>> before allocating new RX buffer. If driver fails to allocate new RX
>>>> buffer driver will try to refill RX buffers in next run. Under
>>>> extreme resource shortage case, this situation can produce no more
>>>> RX buffers in RX descriptor ring and this will take the box out of
>>>> network. Other drivers avoid that situation by allocating new RX
>>>> buffer before passing received frame to upper stack. If RX buffer
>>>> allocation fails driver will just reuse old RX buffer without
>>>> passing received frame to upper stack. That does not completely
>>>> solve the keglim issue though. I think you should have enough mbuf
>>>> cluters to avoid keglim.
>>>>
>>>> However the output above indicates you have enough free mbuf
>>>> clusters. So I guess igb(4) encountered zero available RX buffer
>>>> situation in past but failed to refill the RX buffer again. I guess
>>>> driver may be able to periodically check available RX buffers.
>>>> Jack may have better idea if this was the case.(CCed)
>>>>
>>>
>>> That is exactly the pattern. The driver runs out of clusters but they
>>> eventually get consumed and freed although the driver refuses to process
>> any
>>> new frames. It is, on the other hand, perfectly capable of sending out
>>> packets.
>>>
>>
>> Ok, this clearly indicates igb(4) failed to refill RX buffers since
>> you can still send frames. I'm not sure whether igb(4) controllers
>> could be configured to generate no RX buffer interrupts but that
>> interrupt would be better suited to trigger RX refilling than timer
>> based refilling. Since igb(4) keeps track of available RX buffers,
>> igb(4) can selectively enable that interrupt once it see no RX
>> buffers in the RX descriptor ring. However this does not work with
>> polling.
>>
>
> I think that your evaluation of the problem is correct although I do not
> understand the selective interrupt mechanism you described.
>
> Precisely, the exact same behavior happens (RX hang) if options
> DEVICE_POLLING is _not_ used in the kernel configuration file. I tried with
> POLLING since someone mentioned that it helped in a case mentioned earlier
> today. Unfortunately for igb with or without polling yields the same rx ring
> filing problem.
>
> By the way I fixed the subject where I erroneously said TX was hanging while
> in fact RX is hanging and TX is just fine.
Katim,
could you apply the attached patch and report what the value of
rx_nxt_check and rx_nxt_refresh is when the interface hangs.
You get the values using sysctl -a dev.igb
Best regards
Michael
-------------- next part --------------
A non-text attachment was scrubbed...
Name: patch
Type: application/octet-stream
Size: 1259 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-net/attachments/20110208/968ee0d7/patch.obj
-------------- next part --------------
> _______________________________________________
> freebsd-net at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscribe at freebsd.org"
>
More information about the freebsd-net
mailing list