4.9 kernel panics on a poweredge 2650

Sun Feb 1 07:41:52 PST 2004

	Hi Hackers,

 Ok, now some more infos about my problem:

We have 3 identical webservers (as hw configuration), and the same
kernel and applications running on all three. They get mostly the same
traffic (dns round-robined). They all run 4.9-RELEASE. I have
experienced repetable crashes on all three, so there is no problem
with the hardware (or the possibility of such a thing is too small). 

 I have come to think that the problem is with the kernel memory
space, which is too low. I have compiled the kernel from Generic, by
performing the following modifications:

- maxusers set to 128
- activated SMP (the cpus are HTT-compatible)
- kva_pages set 256 (each box has 2GB of ram and 2Gb of swap)
- PMAP_SHPGPERPROC=401 (for apache)
- ACCEPT_FILTER_DATA and ACCEPT_FILTER_HTTP
- removed unnecessary drivers from the kernel

 /etc/sysctl.conf looks like:

net.inet.tcp.msl=100
net.inet.tcp.blackhole=1
# Hyperthreading
machdep.cpu_idle_hlt=1

kern.ipc.somaxconn=4096
kern.maxfiles=65535
vfs.vmiodirenable=1
kern.ipc.shm_use_phys=1
net.inet.tcp.sendspace=16384

 The boxes run w/o a problem for about 2-3 days, after which they
panic with 'page not present' in different processes (sshd, httpd,
etc). I guess the real reason for this is the low value for kvm_free:

(web1)[~] sysctl -a | grep vm.kvm
vm.kvm_size: 1069543424
vm.kvm_free: 4190208

 But I don't know what causes that. The boxes are not that busy (they
don't even crash during peak-traffic times), and vmstat -m shows me as
a total:

Memory Totals:  In Use       Free    Requests
                 5311K      7090K    15602606

 which also looks sort of normal. So, any idea where I should start
looking in order to see what 'eats' so much kvm space?

 Thank you,
 bogdan 

On Fri, Jan 23, 2004 at 12:48:03PM -0800, Andrew Kinney wrote:
> On 23 Jan 2004 at 13:50, Bogdan TARU wrote:
> 
> > 
> > 
> >  Hi hackers,
> > 
> >  I am experiencing kernel panics on a poweredge 2650 each day around
> >  3am (usually the machine comes up at 3:04am). The kernel panics are
> >  reproductable by running: /etc/periodic/security/100.chksetuid (in
> >  fact by runnning find on /usr with -perms). The problem lies
> >  somewhere in /usr/ports. Deleting the /usr/ports tree doesn't solve
> >  it, trying a cvs up of /usr/ports results in a crash again.
> > 
> 
> Our experience is that repetitive crashes when dealing with large 
> numbers of files (like the ports tree) generally points to hitting 
> some OS resource limit.  Some things to check that may or may not 
> apply to this particular problem:
> 
> sysctl vm.zone
> 
> Make sure you're not hitting any of those limits.
> 
> sysctl vm.kvm_size
> sysctl vm.kvm_free
> 
> If kvm_free is running low just prior to the crash, you might want to 
> increase your KVA_PAGES (see lint) and rebuild your kernel.
> 
> Of course, this is all hit and miss guess work until you have a crash 
> dump, so getting a crash dump and a traceback from a kernel identical 
> to your running kernel with debugging symbols would be a logical 
> first step if you want to avoid any guessing.  If your tracebacks 
> show failures in random locations, you're probably looking at bad 
> RAM.  If you always fail in the same spot with each crash, then it is 
> just a matter of determining why and correcting it.
> 
> I believe the freebsd  developer's handbook has instructions on how 
> to setup a system to do an automatic crash dump for any panic.  It is 
> relatively straightforward.
> 
> Sincerely,
> Andrew Kinney
> President and
> Chief Technology Officer
> Advantagecom Networks, Inc.
> http://www.advantagecom.net
>