FreeBSD 10G forwarding performance @Intel
Alexander V. Chernikov
melifaro at FreeBSD.org
Thu Jul 5 13:41:52 UTC 2012
On 04.07.2012 19:48, Luigi Rizzo wrote:
> On Wed, Jul 04, 2012 at 01:54:01PM +0400, Alexander V. Chernikov wrote:
>> On 04.07.2012 13:12, Luigi Rizzo wrote:
>>> Alex,
>>> i am sure you are aware that in FreeBSD we have netmap too
>> Yes, I'm aware of that :)
>>
>>> which is probably a lot more usable than packetshader
>>> (hw independent, included in the OS, also works on linux...)
>> I'm actually not talking about usability and comparison here :). Thay
>> have nice idea and nice performance graphs. And packetshader is actually
>> _platform_ with fast packet delivery being one (and the only open) part
>> of platform.
>
> i am not sure if i should read the above as a feature or a limitation :)
I'm not trying to compare their i/o code with netmap implementation :)
>
>>
>> Their graphs shows 40MPPS (27G/64byte) CPU-only IPv4 packet forwarding
>> on "two four-core Intel Nehalem CPUs (2.66GHz)" which illustrates
>> software routing possibilities quite clearly.
>
> i suggest to be cautious about graphs in papers (including mine) and
> rely on numbers you can reproduce yourself.
Yup. Of course. However, even it if we divide their number by 4, there
is still a huge gap.
> As your nice experiments showed (i especially liked when you moved
> from one /24 to four /28 routes), at these speeds a factor
> of 2 or more in throughput can easily arise from tiny changes
> in configurations, bus, memory and CPU speeds, and so on.
Traffic stats with most possible counters eliminated:
(there is a possibility in ixgbe code to update rx/tx packets once per
rx_process_limit (which is 100 by default)):
input (ix0) output
packets errs idrops bytes packets errs bytes colls
2.8M 0 0 186M 2.8M 0 186M 0
2.8M 0 0 187M 2.8M 0 186M 0
And it seems that netstat uses 1024 as divisor (no HN_DIVISOR_1000
passed in if.c to show_stat), so real frame count from Ixia side is much
closer to 3MPPS (~ 2.961600 ).
This is wrong from my point of view and we should change it, at least
for packets count.
Here is the patch itself:
http://static.ipfw.ru/files/fbsd10g/no_ifcounters.diff
IPFW contention:
Same setup as shown upper, same traffic level
17:48 [0] test15# ipfw show
00100 0 0 allow ip from any to any
65535 0 0 deny ip from any to any
net.inet.ip.fw.enable: 0 -> 1
input (ix0) output
packets errs idrops bytes packets errs bytes colls
2.1M 734k 0 187M 2.1M 0 139M 0
2.1M 736k 0 187M 2.1M 0 139M 0
2.1M 737k 0 187M 2.1M 0 89M 0
2.1M 735k 0 187M 2.1M 0 189M 0
net.inet.ip.fw.update_counters: 1 -> 0
2.3M 636k 0 187M 2.3M 0 148M 0
2.5M 343k 0 187M 2.5M 0 164M 0
2.5M 351k 0 187M 2.5M 0 164M 0
2.5M 345k 0 187M 2.5M 0 164M 0
Patch here: http://static.ipfw.ru/files/fbsd10g/no_ipfw_counters.diff
It seems that ipfw counters are suffering from this problem, too.
Unfortunately, there is no DPCPU allocator in our kernel.
I'm planning to make a very simple per-cpu counters patch:
(
allocate 65k*(u64_bytes+u64_packets) memory for each CPU per vnet
instance init and make ipfw use it as counter backend.
There is a problem with several rules residing in single entry. This can
(probably) be worked-around by using fast counters for the first such
rule (or not using fast counters for such rules at all)
)
What do you think about this?
>
> cheers
> luigi
>
--
WBR, Alexander
More information about the freebsd-net
mailing list