Freebsd IP Forwarding performance (question,
and some info) [7-stable, current, em, smp]
Andre Oppermann
andre at freebsd.org
Mon Jul 7 13:37:30 UTC 2008
Bruce Evans wrote:
> On Mon, 7 Jul 2008, Andre Oppermann wrote:
>
>> Bruce Evans wrote:
>>> What are the other overheads? I calculate 1.644Mpps counting the
>>> inter-frame
>>> gap, with 64-byte packets and 64-header_size payloads. If the 64 bytes
>>> is for the payload, then the max is much lower.
>>
>> The theoretical maximum at 64byte frames is 1,488,100. I've looked
>> up my notes the 1.244Mpps number can be ajusted to 1.488Mpps.
>
> Where is the extra? I still get 1.644736 Mpps (10^9/(8*64+96)).
> 1.488095 is for 64 bits extra (10^9/(8*64+96+64)).
The preamble has 64 bits and is in addition to the inter-frame gap.
>>>>> I hoped to reach 1Mpps with the hardware I mentioned some mails
>>>>> before, but 2Mpps is far far away.
>>>>> Currently I get 160kpps via pci-32mbit-33mhz-1,2ghz mobile pentium.
>>>>
>>>> This is more or less expected. PCI32 is not able to sustain high
>>>> packet rates. The bus setup times kill the speed. For larger packets
>>>> the ratio gets much better and some reasonable throughput can be
>>>> achieved.
>>>
>>> I get about 640 kpps without forwarding (sendto: slightly faster;
>>> recvfrom: slightly slower) on a 2.2GHz A64. Underclocking the memory
>>> from 200MHz to 100MHz only reduces the speed by about 10%, while not
>>> overclocking the CPU by 10% reduces the speed by the same 10%, so the
>>> system is apparently still mainly CPU-bound.
>>
>> On PCI32 at 33MHz? He's using a 1.2GHz Mobile Pentium on top of that.
>
> Yes. My example shows that FreeBSD is more CPU-bound than I/O bound up
> to CPUs considerably faster than a 1.2GHz Pentium (though PentiumM is
> fast relative to its clock speed). The memory interface may matter more
> than the CPU clock.
>
>>>> NetFPGA doesn't have enough TCAM space to be useful for real routing
>>>> (as in Internet sized routing table). The trick many embedded
>>>> networking
>>>> CPUs use is cache prefetching that is integrated with the network
>>>> controller. The first 64-128bytes of every packet are transferred
>>>> automatically into the L2 cache by the hardware. This allows
>>>> relatively
>>>> slow CPUs (700 MHz Broadcom BCM1250 in Cisco NPE-G1 or 1.67-GHz
>>>> Freescale
>>>> 7448 in NPE-G2) to get more than 1Mpps. Until something like this is
>>>> possible on Intel or AMD x86 CPUs we have a ceiling limited by RAM
>>>> speed.
>>>
>>> Does using fa$ter memory (speed and/or latency) help here? 64 bytes
>>> is so small that latency may be more of a problem, especially without
>>> a prefetch.
>>
>> Latency. For IPv4 packet forwarding only one cache line per packet
>> is fetched. More memory speed only helps with the DMA from/to the
>> network card.
>
> I use low-end memory, but on the machine that does 640 kpps it somehow
> has latency almost 4 times as low as on new FreeBSD cluster machines
> (~42 nsec instead of ~150). perfmon (fixed for AXP and A64) and hwpmc
> report an average of 11 k8-dc-misses per sendto() while sending via
> bge at 640 kpps. 11 * 42 accounts for 442 nsec out of the 1562 per
> packet at this rate. 11 * 150 = 1650 would probably make this rate
> unachievable despite the system having 20 times as much CPU and bus.
We were talking routing here. That is a packet received via network
interface and sent out on another. Crosses the PCI bus twice.
--
Andre
More information about the freebsd-net
mailing list