cvs commit: src/sys/dev/bce if_bce.c if_bcefw.h if_bcereg.h
Peter Wemm
peter at wemm.org
Thu Apr 3 12:21:06 UTC 2008
On Mon, Mar 31, 2008 at 12:34 PM, Peter Wemm <peter at wemm.org> wrote:
>
> On Mon, Mar 31, 2008 at 12:13 PM, David Christensen
> <davidch at broadcom.com> wrote:
> > > On Thu, Feb 21, 2008 at 5:46 PM, David Christensen
> > > <davidch at freebsd.org> wrote:
> > > > Modified files:
> > > > sys/dev/bce if_bce.c if_bcefw.h if_bcereg.h
> > > > Log:
> > > > MFC after: 4 weeks
> > > >
> > > > - Added loose RX MTU functionality to allow frames larger
> > > than 1500 bytes
> > > > to be accepted even though the interface MTU is set to 1500.
> > > > - Implemented new TCP header splitting/jumbo frame
> > > support which uses
> > > > two chains for receive traffic rather than the original
> > > single recevie
> > > > chain.
> > > > - Added additional debug support code.
> > > >
> > > > Revision Changes Path
> > > > 1.36 +1559 -675 src/sys/dev/bce/if_bce.c
> > > > 1.5 +6179 -4850 src/sys/dev/bce/if_bcefw.h
> > > > 1.17 +264 -55 src/sys/dev/bce/if_bcereg.h
> > >
> > > This has been devastating on the freebsd.org cluster.
> > >
> > > Attached are three test runs. I've done a cold reboot, then 'cd
> > > /usr/src/sys' and doing a 'cvs -Rq update' where the CVSROOT is over
> > > nfs.
> > >
> > > First, the old driver:
> > > svn# time cvs -Rq up
> > > 0.890u 4.577s 1:14.48 7.3% 669+2315k 7379+0io 10094pf+0w
> > >
> > > Now, the same test again, but with this change included in the kernel:
> > > svn# time cvs -Rq up
> > > 0.940u 359.906s 7:01.04 85.7% 648+2242k 7365+0io 10082pf+0w
> > >
> > > Note the massive increase (nearly 100 times increase) in system time,
> > > and the almost 7-fold increase in wall clock time.
> > >
> > > Turning on promisc mode helps a lot, but doesn't solve it. (This was
> > > found when ps@ was using tcpdump to try and figure out what the
> > > problem was)
> >
> > The change is needed to update the FreeBSD driver so that it can
> > continue using production firmware for the controllers. The previous
> > firmware was specific to FreeBSD and was not being maintained.
> >
> > I didn't see any performance issues running with netperf. Is the NFS
> > traffic UDP or TCP? What's the MTU in use? How much system memory is
> > available?
>
> NFS over UDP. We're also seeing problems with NIS/YP (also UDP) on
> the box with the driver active. The MTU is the standard 1500. Both
> machines have 8GB of ram. Both are 64 bit kernels. Client is a Dell
> 2950 (2 x quad core2), the server is a HP DL385 (quad opteron with
> bge).
>
>
> > If this is a performance problem then the first place I would look is
> > in the definitions for rx_bd_mbuf_alloc_size and pg_bd_mbuf_alloc_size.
> > The older version of the driver would use multiple 2KB buffers
> > (MCLBYTES in size) from a single chain when building a packet so you
> > would typically have a single mbuf cluster passed to the stack. The
> > new firmware uses two chains, each of which may be a different size.
> > The current implementation will use MHLEN bytes for the rx chain and
> > MCLBYTES for the pg chain. When a packet is received the hardware will
> > place as much data as possible into a single mbuf in the rx chain,
> > then place any remaining data into one or more mbufs in the pg chain.
> > The driver will then stitch together the mbufs before passing them up
> > the stack. This process is supposed to improve performance for TCP
> > because the TCP payload will be split from the TCP header and should
> > be quicker to access.
> >
> > A quick test would be to set rx_bd_mbuf_alloc_size to MCLBYTES, which
> > should for the most part duplicate the older behavior. The driver
> > will still allocate more mbufs which might be a problem if system
> > memory is already low. Is anyone else aware of a driver that does
> > TCP header splitting? It's typically on the TX side to see a packet
> > with two or three mbufs in a chain but I suspect it's less typical
> > on the RX side which could be part of the problem.
>
> The one thing that I'm very sure of is that system memory isn't low,
> on either machine. The extraordinary increase in accumulated system
> time of the process makes me wonder if something odd is going on with
> the TX path. When sending packets, the network stack and driver code
> path execution times are charged to the user process doing the writes.
> On the receive side, the cpu time will be accumulated in either the
> driver ithread or taskqueue, or the netisr kthread. To be honest, I
> hadn't been looking to see if excessive cpu time was accumulating
> there, but I did notice that the system's load average was over 2.0
> for the duration of the 'cvs update' on an otherwise idle machine.
> This suggests to me that both send and receive were bogging down
> somehow.
>
> Perhaps it is something silly like a spin lock being triggered?
>
>
> > >
> > > Here's the same test, with the new driver, and promisc mode on:
> > > svn# ifconfig bce0 promisc
> > > svn# time cvs -Rq up
> > > 0.967u 50.919s 2:13.97 38.7% 650+2250k 7379+0io 10094pf+0w
> > >
> > > It is better.. Only double the wall clock time, but still over 10
> > > times as much system time.
> > >
> >
> > It's not clear to me why promiscuous mode would make a difference
> > here as that should only affect which packets are accepted by the
> > MAC. Is there any teaming or VLANs used in your configuration?
> > The RX MTU settings shouldn't be affected by promiscuous mode.
>
> There is nothing special going on. Just a plain gige cable to a cisco
> gige switch. I have no explanation for the promisc thing - one of the
> freebsd.org admins thought the problem was with YP/NIS. He started up
> a tcpdump to observe the NIS interactions during ssh login, and the
> problem mostly went away.
>
> BTW; I did the test twice. I ran the machine with cvs HEAD, and
> backed the driver out to before the commit. I also tried a RELENG_7
> kernel, and then put the HEAD bce driver on 7.x - the problem goes
> with the bce driver change in both 7.x and 8.x/HEAD.
>
> There will be 4 more of these machines online sometime today (7.x and
> 8,x, both 32 and 64 bit). We can experiment with those at will.
>
>
> >
> >
> > >
> > > So please, don't MFC until this is solved..
> > >
> >
> > I haven't yet as I've received reports from a few other people that
> > they're having problems, though they're functional problems and not
> > performance issues.
On 8.0/i386, with PAE enabled, I get messages on the console and the
system hangs when trying to do a nfs mount. Backing out the driver
fixes it. The same driver doesn't cause quite as spectacular a
failure on 8.0/amd64, but it isn't exactly happy..
Additional IP options:.^M
Mounting NFS file systebcms:e1: link state changed to UP^M
bce1: discard frame w/o leading ethernet header (len 0 pkt len 0)^M
bce1: discard frame w/o leading ethernet header (len 0 pkt len 0)^M
bce1: discard frame w/o leading ethernet header (len 0 pkt len 0)^M
bce1: discard frame w/o leading ethernet header (len 0 pkt len 0)^M
[..forever..]
NFS over UDP, fwiw. Server is a netapp.
--
Peter Wemm - peter at wemm.org; peter at FreeBSD.org; peter at yahoo-inc.com
"All of this is for nothing if we don't go to the stars" - JMS/B5
"If Java had true garbage collection, most programs would delete
themselves upon execution." -- Robert Sewell
**WANTED TO BUY: Garmin Streetpilot 2650 or 2660. Not later model! **
More information about the cvs-src
mailing list