BroadcomBCM5704C 10/100/1000 on TyanThunder K8S pro S2882 twin[Alan Jay] Operteron

Thu Mar 10 13:00:34 GMT 2005

> From: Doug White <dwhite at gumbysoft.com>
> 
> On Mon, 7 Mar 2005, Alan Jay wrote:
> 
> > Well after upgrading to the latest -STABLE via cvsup and makeworld
> makekernel
> > etc we have been doing some more tests over the weekend.
> 
> When did you run this cvsup?

[Alan Jay] March 2nd.

> > One of our databases ran fine all weekend so we took the plunge on Sunday
> to
> > try our big heavily accessed database.
> >
> > It ran fine until 7.45 Monday morning - when I checked at 7.30am it was
> using
> > around 6 of the 8Gb of RAM the server then logged:
> >
> > Mar  7 07:42:47 flappy kernel: bge1: discard frame w/o leading ethernet
> header
> > (len 4294967292 pkt len 4294967292)
> 
> Hm, unsigned -1.  That message is printed by ether_input() if it get
> handed a bum mbuf.
> 
> > Followed by:
> >
> > Mar  7 07:42:47 flappy kernel: Fatal trap 12: pag
> 
> Unfortunately this is not useful. We need the entire panic messsage and
> ideally a backtrace and crashdump.  Can you connect a serial console to
> this system and log the output?

[Alan Jay] We have done that but the serial terminal is attached to a terminal
concentrator and it seems to timeout before logging any useful information.
When we succeeded there was nothing on the serial console in the way of a
panic message.  Sorry not sure how to do a backtrace or crashdump?

> > Subsequently to that it has crashed a number of times and on a couple of
> > occasions has reported:
> >
> > kernel: fxp0: can't map mbuf (error 12)
> 
> Error 12 is ENOMEM and thats coming from bus_dmamap_load_mbuf().  That can
> be returned if you're running out of space for bounce buffers, or kmem in
> general.  scottl has been working on busdma issues in HEAD and recently
> committed a fix for i386 for bounce page allocation issues.
> 
> kmem depletion would be more insidious.  Have you been getting other
> message that indicates failure to allocate memory or error 12?

[Alan Jay] I had seen them before on the console several times.

> > By the way over the weekend the latest -STABLE which is marked 5.4-
> PRERELEASE
> > 2 seemed much better than 5.3 had and the initial problems took much
> longer to
> > appear.  Though once the problems started to appear, they repeated
> themselves
> > rebooting every 1-2hrs until we removed the tests data.
> 
> That behavior sounds a lot like thermal issues.  It takes a while to warm
> up to the critcal point and once it hits that point it really starts to
> malfunction.  Unless the test run starts out slow or something.

[Alan Jay] Unlikely as the servers have been on 24hrs a day since we got them
in a rack at a data centre so the temperature should be reasonable consistent.

[Alan Jay] Thanks for the thoughts.