vr(4) performance
Sam Leffler
sam at errno.com
Sat Nov 4 16:24:59 UTC 2006
Pyun YongHyeon wrote:
> On Thu, Nov 02, 2006 at 03:27:46PM -0800, Sam Leffler wrote:
> > Devon H. O'Dell wrote:
> > > Hey all,
> > >
> > > So, vr(4) kind of sucks, and it seems like this is mostly due to the
> > > fact that we call m_defrag() on every mbuf that we send through it.
> > > This seems to really screw performance on outgoing packets (something
> > > like 33% the output efficiency of fxp(4), if I'm understanding this
> > > all correctly).
> > >
> > > I'm sort of wondering if anybody has attempted to address this before
> > > and if there's a way to possibly mitigate this behavior. I know Bill
> > > Paul's comments say ``Unfortunately, FreeBSD FreeBSD doesn't guarantee
> > > that mbufs will be filled in starting at longword boundaries, so we
> > > have to do a buffer copy before transmission.'' -- since it's been a
> > > long day, and I'm about to go home to grab a pizza and stop thinking
> > > about code, would anybody mind offering suggestions as to either:
> > >
> > > a) Pros and cons of guaranteeing that they're filled in aligned (and
> > > possibly hints on doing it), or
> > > b) Possible workarounds / hacks to do this faster for vr(4)
> > >
> > > Any input is appreciated! (Except ``vr(4) is lol'')
> >
> > m_defrag is ~10x slower than it needs to be. I proposed changes to
> > address this a while back but eventually gave up and put driver-specific
> > code in ath. You can look there or I can send you some patches to
> > m_defrag to try out in vr.
> >
>
> Because the purpose of m_defrag(9) in vr(4) is to guarantee longword
> aligned mbufs I'm not sure ath_defrag can be used here. If memory
> serve me right ath_defrag would not change the first mbuf address
> in a chain. If the first mbuf is not aligned on longword boundary
> it wouldn't work I guess. Of course we can check the first mbuf in
> the chain before calling super-fast ath_defrag, I guess.
>
m_defrag is used for two purposes (mainly) in the system: reducing the
mbuf count in a chain so that an outbound packet fits in a limited
number of h/w tx descriptors and aligning packet data for cards with
constrained dma engines. Both these operations belong in bus_dma.
Combining both these operations in a single routine results in overly
pessimistic code for the common case. Separately the algorithm in
m_defrag is suboptimal (e.g. it makes a complete copy even when a packet
needs no changes).
ath_defrag is example code tailored to the ath driver that handles only
the mbuf chain too long issue. I have other code that can do packet
alignment and/or both alignment+mbuf coalescing far better than the
current logic in m_defrag.
The right solution to this problem--as suggested by John Baldwin and
Scott Long is to improve the bus_dma code so these things happen
automatically for the driver according to the dma tag config. This
would eliminate the need for m_defrag in all cases I'm aware of. Since
bus_dma has info like the max # segments a device can accept and any
alignment constraints it can do a much more efficient job.
Sam
More information about the freebsd-hackers
mailing list