Network stack changes

Fri Sep 13 15:08:25 UTC 2013

On Aug 29, 2013, at 7:49 , Adrian Chadd <adrian at freebsd.org> wrote:

> Hi,
> 
> There's a lot of good stuff to review here, thanks!
> 
> Yes, the ixgbe RX lock needs to die in a fire. It's kinda pointless to keep
> locking things like that on a per-packet basis. We should be able to do
> this in a cleaner way - we can defer RX into a CPU pinned taskqueue and
> convert the interrupt handler to a fast handler that just schedules that
> taskqueue. We can ignore the ithread entirely here.
> 
> What do you think?
> 
> Totally pie in the sky handwaving at this point:
> 
> * create an array of mbuf pointers for completed mbufs;
> * populate the mbuf array;
> * pass the array up to ether_demux().
> 
> For vlan handling, it may end up populating its own list of mbufs to push
> up to ether_demux(). So maybe we should extend the API to have a bitmap of
> packets to actually handle from the array, so we can pass up a larger array
> of mbufs, note which ones are for the destination and then the upcall can
> mark which frames its consumed.
> 
> I specifically wonder how much work/benefit we may see by doing:
> 
> * batching packets into lists so various steps can batch process things
> rather than run to completion;
> * batching the processing of a list of frames under a single lock instance
> - eg, if the forwarding code could do the forwarding lookup for 'n' packets
> under a single lock, then pass that list of frames up to inet_pfil_hook()
> to do the work under one lock, etc, etc.
> 
> Here, the processing would look less like "grab lock and process to
> completion" and more like "mark and sweep" - ie, we have a list of frames
> that we mark as needing processing and mark as having been processed at
> each layer, so we know where to next dispatch them.
> 

One quick note here.  Every time you increase batching you may increase bandwidth
but you will also increase per packet latency for the last packet in a batch.
That is fine so long as we remember that and that this is a tuning knob
to balance the two.

> I still have some tool coding to do with PMC before I even think about
> tinkering with this as I'd like to measure stuff like per-packet latency as
> well as top-level processing overhead (ie, CPU_CLK_UNHALTED.THREAD_P /
> lagg0 TX bytes/pkts, RX bytes/pkts, NIC interrupts on that core, etc.)
> 

This would be very useful in identifying the actual hot spots, and would be helpful
to anyone who can generate a decent stream of packets with, say, an IXIA.

Best,
George

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 203 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://lists.freebsd.org/pipermail/freebsd-net/attachments/20130913/5ff29edd/attachment.sig>