E4500 with 24GB RAM

Sat Jun 11 07:54:22 GMT 2005

On Sat, Jun 11, 2005 at 03:36:41AM -0400, Kris Kennaway wrote:
 > On Sat, Jun 11, 2005 at 04:26:32PM +0900, Pyun YongHyeon wrote:
 > > On Sat, Jun 11, 2005 at 03:40:28PM +0900, Hiroki Sato wrote:
 > >  > Kris Kennaway <kris at obsecurity.org> wrote
 > >  >   in <20050610211239.GA59402 at xor.obsecurity.org>:
 > >  > 
 > >  > kr> I wonder if it's disk related.  I tried to check out a ports tree on
 > >  > kr> this machine and it hung in a few seconds (although this was also
 > >  > kr> checking out using the network via nfs).
 > >  > 
 > >  >  I do not know why but the freeze occurs only when displaying "invalid
 > >  >  packet size xxx; dropping".  When I tried "vmstat 1" on the serial
 > >  >  console and fetching a large file via ftp at the same time
 > >  >  with 12GB RAM configuration, the freeze did not occur.
 > >  >  Once "hme0: too may errors; not reporting any more" is displayed,
 > >  >  the box seems to work fine and I can check out the ports tree via NFS
 > >  >  without problems.
 > >  > 
 > > 
 > > Normally the "invalid packet size" message comes from link mismatch.
 > > If your HME's PHY is DP83840 there are known issues on link neogotiation.
 > > AFAIK the issue has nothing to do with panic as I always see that on my
 > > Ultra2 which has DP83840 PHY too.
 > 
 > This is on e450 and e4500 machines.  I don't think there's a link
 > mismatch.
 > 
 > > I wonder how you can use NFS reliably on sparc64. Due to failure of
 > > alignment(both server and client) it's really easy to get panic on sparc64.
 > 
 > AFAICR I've never seen a problem with this (except with an i386 4.x
 > server, which can be panicked by a sparc64 client)..I don't rely on
 > NFS heavily in most cases, but I do use it on a number of machines
 > (including two package build machines that netboot and access their
 > ports trees over NFS, and have been in continuous operation with an
 > uptime of 110 days).
 > 
Do you use NFS orver UDP? NFS over TCP has much better change of
getting panic.
If you copy a large file(> 100MB) from a NFS exported directory to
its sub-directory you probably hit a panic. If my memory serve right
there had been several NFS panic reports in current/sparc64 ML.
And I don't think it was fixed since the root cause of panic is in
nfsm_disct() and nfs_realign()(it was not touched for a long time.)

 > >  >  I tried to comment out the HME_WHINE line of hme_read() in if_hme.c,
 > >  >  and it seems to make the box work fine so far.
 > > 
 > > HME_WHINE() just prints a message. I can't think removing the function
 > > can cure your problem.
 > 
 > It does seem to have helped though..before it would reliably lock up
 > in seconds, and DDB break was non-responsive.
 > 

Hmm... Then it would be great to get a voluntary core dump when hme(4)
hits the condition(e.g. before processing HME_WHINE, invoke panic(9)).

 > Kris

-- 
Regards,
Pyun YongHyeon
http://www.kr.freebsd.org/~yongari	|	yongari at freebsd.org