debugging frequent kernel panics on 8.2-RELEASE

Steven Hartland killing at multiplay.co.uk
Sun Aug 14 14:53:54 UTC 2011


----- Original Message ----- 
From: "Andriy Gapon" <avg at FreeBSD.org>
> 
> Maybe test it on couple of machines first just in case I overlooked something
> essential, although I have a report from another use that the patch didn't break
> anything for him (it was tested for an unrelated issue).

We've got this running on a ~40 machines and just had the first panic
since the update. Unfortunately it doesn't seem to have changed anything :(

We have 352 thread entries starting with:-
#0  sched_switch (td=0xffffffff8083e4e0, newtd=0xffffff0012d838c0, flags=Variable "flags" is not available.
23 with:-
cpustop_handler () at atomic.h:285
and 16 with:-
#0  fork_trampoline () at /usr/src/sys/amd64/amd64/exception.S:562

The main message being:-
panic: double fault

GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...

Unread portion of the kernel message buffer:
<118>Aug 14 15:13:33 amsbld15 syslogd: exiting on signal 15

Fatal double fault
rip = 0xffffffff8053b691
rsp = 0xffffff8d8f356fb0
rbp = 0xffffff8d8f357210
cpuid = 2; apic id = 02
panic: double fault
cpuid = 2
KDB: stack backtrace:
#0 0xffffffff803bb75e at kdb_backtrace+0x5e
#1 0xffffffff8038956e at panic+0x2ae
#2 0xffffffff805802b6 at dblfault_handler+0x96
#3 0xffffffff8056900d at Xdblfault+0xad
stack: 0xffffff8d8f357000, 4
rsp = 0xffffff800009ae10
Uptime: 2d21h6m18s
Physical memory: 49132 MB
Dumping 17080 MB: 17065...
Reading symbols from /boot/kernel/zfs.ko...Reading symbols from /boot/kernel/zfs.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/zfs.ko
Reading symbols from /boot/kernel/opensolaris.ko...Reading symbols from /boot/kernel/opensolaris.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/opensolaris.ko
Reading symbols from /boot/kernel/linprocfs.ko...Reading symbols from /boot/kernel/linprocfs.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/linprocfs.ko
Reading symbols from /boot/kernel/nullfs.ko...Reading symbols from /boot/kernel/nullfs.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/nullfs.ko
#0  sched_switch (td=0xffffffff8083e4e0, newtd=0xffffff0012d838c0, flags=Variable "flags" is not available.)
    at /usr/src/sys/kern/sched_ule.c:1858
1858            cpuid = PCPU_GET(cpuid);
(kgdb) #0  sched_switch (td=0xffffffff8083e4e0, newtd=0xffffff0012d838c0, flags=Variable "flags" is not available.)
    at /usr/src/sys/kern/sched_ule.c:1858
#1  0xffffffff80391a99 in mi_switch (flags=260, newtd=0x0)
    at /usr/src/sys/kern/kern_synch.c:451
#2  0xffffffff803c5112 in sleepq_timedwait (wchan=0xffffffff8083e080, pri=68)
    at /usr/src/sys/kern/subr_sleepqueue.c:644
#3  0xffffffff80391efb in _sleep (ident=0xffffffff8083e080, lock=0x0,
    priority=Variable "priority" is not available.) at /usr/src/sys/kern/kern_synch.c:230
#4  0xffffffff8053ebc9 in scheduler (dummy=Variable "dummy" is not available.)
    at /usr/src/sys/vm/vm_glue.c:807
#5  0xffffffff80341767 in mi_startup () at /usr/src/sys/kern/init_main.c:254
#6  0xffffffff8016efdc in btext () at /usr/src/sys/amd64/amd64/locore.S:81
#7  0xffffffff80863dc8 in sleepq_chains ()
#8  0xffffffff80848ae0 in cpu_top ()
#9  0x0000000000000000 in ?? ()
#10 0xffffffff8083e4e0 in proc0 ()
#11 0xffffffff80bb3b90 in ?? ()
#12 0xffffffff80bb3b38 in ?? ()
#13 0xffffff0012d838c0 in ?? ()
#14 0xffffffff803aeb19 in sched_switch (td=0x0, newtd=0x0, flags=Variable "flags" is not available.)
    at /usr/src/sys/kern/sched_ule.c:1852
Previous frame inner to this frame (corrupt stack?)

There are some indications that stopping jails could be the
cause of the panics so on one test box I've added in invariants
to see if we get anything shows up from that.

    Regards
    Steve


================================================
This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 

In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337
or return the E.mail to postmaster at multiplay.co.uk.



More information about the freebsd-stable mailing list