realtek performance (was Re: good ATI chipset results)
Scott Long
scottl at samsco.org
Thu Oct 13 12:46:27 PDT 2005
John Baldwin wrote:
> On Thursday 13 October 2005 01:07 pm, Sean McNeil wrote:
>
>>On Thu, 2005-10-13 at 11:49 -0400, John Baldwin wrote:
>>
>>>On Thursday 13 October 2005 11:13 am, Sean McNeil wrote:
>>>
>>>>On Thu, 2005-10-13 at 09:17 -0400, Mike Tancsa wrote:
>>>>
>>>>>Havent really seen anyone else use this board, but I have had good
>>>>>luck with it so far
>>>>>
>>>>>http://www.ecs.com.tw/ECSWeb/Products/ProductsDetail.aspx?DetailID=50
>>>>>6&Me nuID=90&LanID=0
>>>>>
>>>>>Its a micro ATX formfactor with built in video and the onboard NIC is
>>>>>a realtek. (Although its not the fastest NIC, its driver is stable
>>>>>and mature-- especially compared to the headaches people seem to have
>>>>>with the NVIDIA NICs.)
>>>>
>>>>Is this the RealTek 8169S Single-chip Gigabit Ethernet?
>>>>
>>>>For those interested, here are some changes I always use to increase
>>>>the performance of the above NIC. With these mods, I can stream over
>>>>20 MBps video multicast and do other stuff over the network without
>>>>issues. Without the changes, xmit is horrible with severe UDP packet
>>>>loss.
>>>
>>>So, I see two changes. One is to up the number of descriptors from 32 rx
>>>and 64 tx to 64 rx and 64 tx on some models and 1024 rx and 1024 tx on
>>>other modules. The other thing is that you seem to pessimize TX
>>>performance by always forcing the send packets to be coalesced into one
>>>mbuf (which requires doing an alloc and then copying all of the data)
>>>instead of making use of scatter/gatter for sending packets. Do you need
>>>both changes or do just the higher descriptor counts make the difference?
>>
>>Actually, I've found that the higher descriptor counts do not make a
>>noticeable difference. The only thing that mattered was to eliminate
>>the scatter/gather of sending packets. I can't remember why I left the
>>descriptor increase in there. I think it was to get the best use out of
>>the hardware.
>
>
> Hmm, odd. Scott, do you have any ideas why m_defrag() plus one descriptor
> would be faster than s/g dma for re(4)?
>
There are two things that I would consider. First is that
bus_dmamap_load_mbuf_sg()
should be use, as that cuts out some indirection (and thus latency) in
the code. Second
is that not all DMA engines are created equal, and I honestly wouldn't
expect a whole lot
out of Realtek given the price point of this chip. It might be
optimized only for operating
on only a single S/G element, for example. Maybe it's really slow at
pre-fetching s/g
elements, or maybe it has some sort of a stall after each DMA sement
transfer while it
restarts a state machine. I've seen evidence in other hardware that
only one S/G element
should be used even though there are slots for 2 (or 3 in the case of 9k
jumbo frames). One
thing to keep in mind is the difference in the driver models between
Windows and BSD
that Bill Paul talked about the other day. In the Windows world, the
driver owns the
network packet memory, whereas in BSD the stack owns it (in the form of
mbufs). This
means that the driver can pre-allocate a contiguous slab and populate
the descriptor rings
with it without ever having to worry about s/g fragmentation, while in
BSD fragmentation
is a fact of life. So it's likely yet another case of hardware being
optimized for certain
characteristics of Windows at the expense of other operating systems.
Scott
More information about the freebsd-amd64
mailing list