vr(4) performance

Sam Leffler sam at errno.com
Sat Nov 4 16:24:59 UTC 2006


Pyun YongHyeon wrote:
> On Thu, Nov 02, 2006 at 03:27:46PM -0800, Sam Leffler wrote:
>  > Devon H. O'Dell wrote:
>  > > Hey all,
>  > > 
>  > > So, vr(4) kind of sucks, and it seems like this is mostly due to the
>  > > fact that we call m_defrag() on every mbuf that we send through it.
>  > > This seems to really screw performance on outgoing packets (something
>  > > like 33% the output efficiency of fxp(4), if I'm understanding this
>  > > all correctly).
>  > > 
>  > > I'm sort of wondering if anybody has attempted to address this before
>  > > and if there's a way to possibly mitigate this behavior. I know Bill
>  > > Paul's comments say ``Unfortunately, FreeBSD FreeBSD doesn't guarantee
>  > > that mbufs will be filled in starting at longword boundaries, so we
>  > > have to do a buffer copy before transmission.'' -- since it's been a
>  > > long day, and I'm about to go home to grab a pizza and stop thinking
>  > > about code, would anybody mind offering suggestions as to either:
>  > > 
>  > > a) Pros and cons of guaranteeing that they're filled in aligned (and
>  > > possibly hints on doing it), or
>  > > b) Possible workarounds / hacks to do this faster for vr(4)
>  > > 
>  > > Any input is appreciated! (Except ``vr(4) is lol'')
>  > 
>  > m_defrag is ~10x slower than it needs to be.  I proposed changes to
>  > address this a while back but eventually gave up and put driver-specific
>  > code in ath.  You can look there or I can send you some patches to
>  > m_defrag to try out in vr.
>  > 
> 
> Because the purpose of m_defrag(9) in vr(4) is to guarantee longword
> aligned mbufs I'm not sure ath_defrag can be used here. If memory
> serve me right ath_defrag would not change the first mbuf address
> in a chain. If the first mbuf is not aligned on longword boundary
> it wouldn't work I guess. Of course we can check the first mbuf in
> the chain before calling super-fast ath_defrag, I guess.
> 

m_defrag is used for two purposes (mainly) in the system: reducing the
mbuf count in a chain so that an outbound packet fits in a limited
number of h/w tx descriptors and aligning packet data for cards with
constrained dma engines.  Both these operations belong in bus_dma.
Combining both these operations in a single routine results in overly
pessimistic code for the common case.  Separately the algorithm in
m_defrag is suboptimal (e.g. it makes a complete copy even when a packet
needs no changes).

ath_defrag is example code tailored to the ath driver that handles only
the mbuf chain too long issue.  I have other code that can do packet
alignment and/or both alignment+mbuf coalescing far better than the
current logic in m_defrag.

The right solution to this problem--as suggested by John Baldwin and
Scott Long is to improve the bus_dma code so these things happen
automatically for the driver according to the dma tag config.  This
would eliminate the need for m_defrag in all cases I'm aware of.  Since
bus_dma has info like the max # segments a device can accept and any
alignment constraints it can do a much more efficient job.

	Sam


More information about the freebsd-hackers mailing list