ixgbe & if_igb RX ring locking

Mon Oct 15 10:11:55 UTC 2012

On 13.10.2012 23:24, Jack Vogel wrote:
> On Sat, Oct 13, 2012 at 11:22 AM, Luigi Rizzo<rizzo at iet.unipi.it>  wrote:

>>
>> one option could be (same as it is done in the timer
>> routine in dummynet) to build a list of all the packets
>> that need to be sent to if_input(), and then call
>> if_input with the entire list outside the lock.
>>
>> It would be even easier if we modify the various *_input()
>> routines to handle a list of mbufs instead of just one.

Bulk processing is generally a good idea we probably should implement.
Probably starting from driver queue ending with marked mbufs 
(OURS/forward/legacy processing (appletalk and similar))?

This can minimize an impact for all
locks on RX side:
L2
* rx PFIL hook
L3 (both IPv4 and IPv6)
* global IF_ADDR_RLOCK (currently commented out)
* Per-interface ADDR_RLOCK
* PFIL hook

 From the first glance, there can be problems with:
* Increased latency (we should have some kind of rx_process_limit), but 
still
* reader locks being acquired for much longer amount of time

>>
>> cheers
>> luigi
>>
>> Very interesting idea Luigi, will have to get that some thought.
>
> Jack

Returning to original post topic:

Given
1) we are currently binding ixgbe ithreads to CPU cores
2) RX queue lock is used by (indirectly) in only 2 places:
a) ISR routine (msix or legacy irq)
b) taskqueue routine which is scheduled if some packets remains in RX 
queue and rx_process_limit ended OR we need something to TX

3) in practice taskqueue routine is a nightmare for many people since 
there is no way to stop "kernel {ix0 que}" thread eating 100% cpu after 
some traffic burst happens: once it is called it starts to schedule 
itself more and more replacing original ISR routine. Additionally, 
increasing rx_process_limit does not help since taskqueue is called with 
the same limit. Finally, currently netisr taskq threads are not bound to 
any CPU which makes the process even more uncontrollable.

Maybe we can rethink taskqueue usage for RX processing?
I mean, taskq is called if host fails to process packets in ring fast 
enough, which can happen when:
* traffic burst happens on some (or all) queue
* traffic ratio is too high.

In former case we have ring buffer size which can be tuned by 
administrator to fairly big value.
For latter case:
If all system CPUs are used for RX processing moving some uncontrolled 
percent of load to random CPU definitely does no good (especially given 
that ixgbe has AIM and RX indirection table for that purposes which can 
give much more predictable results)

It does even more evil in case of special setups like 
rx_queues=CPU_COUNT-1 and the last CPU is used by all other processes 
including control plane one (routing software, various keepalives).

If system has more CPUs (24 vs 16 queues, for example) there is standard 
way for distributing load: netisr and deferred processing.
Netisr threads are already CPU-bound, and, more important, splitting
packets to different threads can be done by performing some (say, L3+L4) 
hash computation which will not lead to out-of-order packet processing.

>
>> So my questions are:
>>>
>>> Can any real LORs happen in some complex setup? (I can't imagine any).
>>> If so: maybe we can somehow avoid/workaround such cases? (and consider
>>> removing those locks).
>>>
>>>
>>>
>>> --
>>> WBR, Alexander
>>>
>>> _______________________________________________
>>> freebsd-net at freebsd.org mailing list
>>> http://lists.freebsd.org/mailman/listinfo/freebsd-net
>>> To unsubscribe, send any mail to "freebsd-net-unsubscribe at freebsd.org"
>>
> _______________________________________________
> freebsd-net at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscribe at freebsd.org"
>