Network stack changes
Sami Halabi
sodynet1 at gmail.com
Mon Sep 23 22:46:47 UTC 2013
Hi,
> http://info.iet.unipi.it/~**luigi/papers/20120601-dxr.pdf<http://info.iet.unipi.it/~luigi/papers/20120601-dxr.pdf>
> http://www.nxlab.fer.hr/dxr/**stable_8_20120824.diff<http://www.nxlab.fer.hr/dxr/stable_8_20120824.diff>
I've tried the diff in 10-current, applied cleanly but had errors compiling
new kernel... is there any work to make it work? i'd love to test it.
Sami
On Sun, Sep 22, 2013 at 11:12 PM, Alexander V. Chernikov <
melifaro at yandex-team.ru> wrote:
> On 29.08.2013 15:49, Adrian Chadd wrote:
>
>> Hi,
>>
> Hello Adrian!
> I'm very sorry for the looong reply.
>
>
>
>> There's a lot of good stuff to review here, thanks!
>>
>> Yes, the ixgbe RX lock needs to die in a fire. It's kinda pointless to
>> keep locking things like that on a per-packet basis. We should be able to
>> do this in a cleaner way - we can defer RX into a CPU pinned taskqueue and
>> convert the interrupt handler to a fast handler that just schedules that
>> taskqueue. We can ignore the ithread entirely here.
>>
>> What do you think?
>>
> Well, it sounds good :) But performance numbers and Jack opinion is more
> important :)
>
> Are you going to Malta?
>
>
>> Totally pie in the sky handwaving at this point:
>>
>> * create an array of mbuf pointers for completed mbufs;
>> * populate the mbuf array;
>> * pass the array up to ether_demux().
>>
>> For vlan handling, it may end up populating its own list of mbufs to push
>> up to ether_demux(). So maybe we should extend the API to have a bitmap of
>> packets to actually handle from the array, so we can pass up a larger array
>> of mbufs, note which ones are for the destination and then the upcall can
>> mark which frames its consumed.
>>
>> I specifically wonder how much work/benefit we may see by doing:
>>
>> * batching packets into lists so various steps can batch process things
>> rather than run to completion;
>> * batching the processing of a list of frames under a single lock
>> instance - eg, if the forwarding code could do the forwarding lookup for
>> 'n' packets under a single lock, then pass that list of frames up to
>> inet_pfil_hook() to do the work under one lock, etc, etc.
>>
> I'm thinking the same way, but we're stuck with 'forwarding lookup' due to
> problem with egress interface pointer, as I mention earlier. However it is
> interesting to see how much it helps, regardless of locking.
>
> Currently I'm thinking that we should try to change radix to something
> different (it seems that it can be checked fast) and see what happened.
> Luigi's performance numbers for our radix are too awful, and there is a
> patch implementing alternative trie:
> http://info.iet.unipi.it/~**luigi/papers/20120601-dxr.pdf<http://info.iet.unipi.it/~luigi/papers/20120601-dxr.pdf>
> http://www.nxlab.fer.hr/dxr/**stable_8_20120824.diff<http://www.nxlab.fer.hr/dxr/stable_8_20120824.diff>
>
>
>
>
>> Here, the processing would look less like "grab lock and process to
>> completion" and more like "mark and sweep" - ie, we have a list of frames
>> that we mark as needing processing and mark as having been processed at
>> each layer, so we know where to next dispatch them.
>>
>> I still have some tool coding to do with PMC before I even think about
>> tinkering with this as I'd like to measure stuff like per-packet latency as
>> well as top-level processing overhead (ie, CPU_CLK_UNHALTED.THREAD_P /
>> lagg0 TX bytes/pkts, RX bytes/pkts, NIC interrupts on that core, etc.)
>>
> That will be great to see!
>
>>
>> Thanks,
>>
>>
>>
>> -adrian
>>
>>
> ______________________________**_________________
> freebsd-net at freebsd.org mailing list
> http://lists.freebsd.org/**mailman/listinfo/freebsd-net<http://lists.freebsd.org/mailman/listinfo/freebsd-net>
> To unsubscribe, send any mail to "freebsd-net-unsubscribe@**freebsd.org<freebsd-net-unsubscribe at freebsd.org>
> "
>
--
Sami Halabi
Information Systems Engineer
NMS Projects Expert
FreeBSD SysAdmin Expert
On Sun, Sep 22, 2013 at 11:12 PM, Alexander V. Chernikov <
melifaro at yandex-team.ru> wrote:
> On 29.08.2013 15:49, Adrian Chadd wrote:
>
>> Hi,
>>
> Hello Adrian!
> I'm very sorry for the looong reply.
>
>
>
>> There's a lot of good stuff to review here, thanks!
>>
>> Yes, the ixgbe RX lock needs to die in a fire. It's kinda pointless to
>> keep locking things like that on a per-packet basis. We should be able to
>> do this in a cleaner way - we can defer RX into a CPU pinned taskqueue and
>> convert the interrupt handler to a fast handler that just schedules that
>> taskqueue. We can ignore the ithread entirely here.
>>
>> What do you think?
>>
> Well, it sounds good :) But performance numbers and Jack opinion is more
> important :)
>
> Are you going to Malta?
>
>
>> Totally pie in the sky handwaving at this point:
>>
>> * create an array of mbuf pointers for completed mbufs;
>> * populate the mbuf array;
>> * pass the array up to ether_demux().
>>
>> For vlan handling, it may end up populating its own list of mbufs to push
>> up to ether_demux(). So maybe we should extend the API to have a bitmap of
>> packets to actually handle from the array, so we can pass up a larger array
>> of mbufs, note which ones are for the destination and then the upcall can
>> mark which frames its consumed.
>>
>> I specifically wonder how much work/benefit we may see by doing:
>>
>> * batching packets into lists so various steps can batch process things
>> rather than run to completion;
>> * batching the processing of a list of frames under a single lock
>> instance - eg, if the forwarding code could do the forwarding lookup for
>> 'n' packets under a single lock, then pass that list of frames up to
>> inet_pfil_hook() to do the work under one lock, etc, etc.
>>
> I'm thinking the same way, but we're stuck with 'forwarding lookup' due to
> problem with egress interface pointer, as I mention earlier. However it is
> interesting to see how much it helps, regardless of locking.
>
> Currently I'm thinking that we should try to change radix to something
> different (it seems that it can be checked fast) and see what happened.
> Luigi's performance numbers for our radix are too awful, and there is a
> patch implementing alternative trie:
> http://info.iet.unipi.it/~**luigi/papers/20120601-dxr.pdf<http://info.iet.unipi.it/~luigi/papers/20120601-dxr.pdf>
> http://www.nxlab.fer.hr/dxr/**stable_8_20120824.diff<http://www.nxlab.fer.hr/dxr/stable_8_20120824.diff>
>
>
>
>
>> Here, the processing would look less like "grab lock and process to
>> completion" and more like "mark and sweep" - ie, we have a list of frames
>> that we mark as needing processing and mark as having been processed at
>> each layer, so we know where to next dispatch them.
>>
>> I still have some tool coding to do with PMC before I even think about
>> tinkering with this as I'd like to measure stuff like per-packet latency as
>> well as top-level processing overhead (ie, CPU_CLK_UNHALTED.THREAD_P /
>> lagg0 TX bytes/pkts, RX bytes/pkts, NIC interrupts on that core, etc.)
>>
> That will be great to see!
>
>>
>> Thanks,
>>
>>
>>
>> -adrian
>>
>>
> ______________________________**_________________
> freebsd-net at freebsd.org mailing list
> http://lists.freebsd.org/**mailman/listinfo/freebsd-net<http://lists.freebsd.org/mailman/listinfo/freebsd-net>
> To unsubscribe, send any mail to "freebsd-net-unsubscribe@**freebsd.org<freebsd-net-unsubscribe at freebsd.org>
> "
>
--
Sami Halabi
Information Systems Engineer
NMS Projects Expert
FreeBSD SysAdmin Expert
More information about the freebsd-net
mailing list