4.9 kernel panics on a poweredge 2650
Bogdan TARU
bgd at icomag.de
Sun Feb 1 07:41:52 PST 2004
Hi Hackers,
Ok, now some more infos about my problem:
We have 3 identical webservers (as hw configuration), and the same
kernel and applications running on all three. They get mostly the same
traffic (dns round-robined). They all run 4.9-RELEASE. I have
experienced repetable crashes on all three, so there is no problem
with the hardware (or the possibility of such a thing is too small).
I have come to think that the problem is with the kernel memory
space, which is too low. I have compiled the kernel from Generic, by
performing the following modifications:
- maxusers set to 128
- activated SMP (the cpus are HTT-compatible)
- kva_pages set 256 (each box has 2GB of ram and 2Gb of swap)
- PMAP_SHPGPERPROC=401 (for apache)
- ACCEPT_FILTER_DATA and ACCEPT_FILTER_HTTP
- removed unnecessary drivers from the kernel
/etc/sysctl.conf looks like:
net.inet.tcp.msl=100
net.inet.tcp.blackhole=1
# Hyperthreading
machdep.cpu_idle_hlt=1
kern.ipc.somaxconn=4096
kern.maxfiles=65535
vfs.vmiodirenable=1
kern.ipc.shm_use_phys=1
net.inet.tcp.sendspace=16384
The boxes run w/o a problem for about 2-3 days, after which they
panic with 'page not present' in different processes (sshd, httpd,
etc). I guess the real reason for this is the low value for kvm_free:
(web1)[~] sysctl -a | grep vm.kvm
vm.kvm_size: 1069543424
vm.kvm_free: 4190208
But I don't know what causes that. The boxes are not that busy (they
don't even crash during peak-traffic times), and vmstat -m shows me as
a total:
Memory Totals: In Use Free Requests
5311K 7090K 15602606
which also looks sort of normal. So, any idea where I should start
looking in order to see what 'eats' so much kvm space?
Thank you,
bogdan
On Fri, Jan 23, 2004 at 12:48:03PM -0800, Andrew Kinney wrote:
> On 23 Jan 2004 at 13:50, Bogdan TARU wrote:
>
> >
> >
> > Hi hackers,
> >
> > I am experiencing kernel panics on a poweredge 2650 each day around
> > 3am (usually the machine comes up at 3:04am). The kernel panics are
> > reproductable by running: /etc/periodic/security/100.chksetuid (in
> > fact by runnning find on /usr with -perms). The problem lies
> > somewhere in /usr/ports. Deleting the /usr/ports tree doesn't solve
> > it, trying a cvs up of /usr/ports results in a crash again.
> >
>
> Our experience is that repetitive crashes when dealing with large
> numbers of files (like the ports tree) generally points to hitting
> some OS resource limit. Some things to check that may or may not
> apply to this particular problem:
>
> sysctl vm.zone
>
> Make sure you're not hitting any of those limits.
>
> sysctl vm.kvm_size
> sysctl vm.kvm_free
>
> If kvm_free is running low just prior to the crash, you might want to
> increase your KVA_PAGES (see lint) and rebuild your kernel.
>
> Of course, this is all hit and miss guess work until you have a crash
> dump, so getting a crash dump and a traceback from a kernel identical
> to your running kernel with debugging symbols would be a logical
> first step if you want to avoid any guessing. If your tracebacks
> show failures in random locations, you're probably looking at bad
> RAM. If you always fail in the same spot with each crash, then it is
> just a matter of determining why and correcting it.
>
> I believe the freebsd developer's handbook has instructions on how
> to setup a system to do an automatic crash dump for any panic. It is
> relatively straightforward.
>
> Sincerely,
> Andrew Kinney
> President and
> Chief Technology Officer
> Advantagecom Networks, Inc.
> http://www.advantagecom.net
>
More information about the freebsd-hackers
mailing list