debugging frequent kernel panics on 8.2-RELEASE
Steven Hartland
killing at multiplay.co.uk
Wed Aug 10 15:36:53 UTC 2011
----- Original Message -----
From: "Steven Hartland" <killing at multiplay.co.uk>
To: <freebsd-stable at freebsd.org>
Sent: Wednesday, August 10, 2011 3:22 PM
Subject: debugging frequent kernel panics on 8.2-RELEASE
> We're currently experiencing a large number of kernel panics
> on FreeBSD 8.2-RELEASE across a large number of machines here.
>
> The base stack reported is a double fault with no additional
> details and CTRL+ALT+ESC fails to break to the debugger as
> does and NMI, even though it at least tries printing the
> following many times some quite jumbled:-
> NMI ... going to debugger
>
> We've configured the dump device but that also seems to fail
> to capture any details just sitting there after panic with
> Dumping 4465MB:
>
> The machines are single disk ZFS root install and the dump
> device is configured using the gptid, could this be what's
> preventing the dump happening?
>
> The kernel is compiled with:-
> options KDB # Kernel debugger related code
> options KDB_TRACE # Print a stack trace for a panic
>
> We have remove KVM but not remote serial on the most of the
> machines.
>
> Any advice on how to debug this issue?
ldn32.multiplay.co.uk dumped core - see /var/crash/vmcore.0
Wed Aug 10 14:02:07 UTC 2011
FreeBSD crash 8.2-RELEASE FreeBSD 8.2-RELEASE #0: Thu Jul 21 11:05:52 BST 2011
root at crash:/usr/obj/usr/src/sys/MULTIPLAY amd64
panic: double fault
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...
Unread portion of the kernel message buffer:
Fatal double fault
rip = 0xffffffff8052f6f1
rsp = 0xffffff86ce600fb0
rbp = 0xffffff86ce601210
cpuid = 0; apic id = 00
panic: double fault
cpuid = 0
KDB: stack backtrace:
#0 0xffffffff803af91e at kdb_backtrace+0x5e
#1 0xffffffff8037d817 at panic+0x187
#2 0xffffffff80574316 at dblfault_handler+0x96
#3 0xffffffff8055d06d at Xdblfault+0xad
Uptime: 13d20h53m31s
Physical memory: 24555 MB
Dumping 3283 MB: 3268 3252 3236 3220 3204 3188 3172 3156 3140 3124 3108 3092 3076 3060 3044 3028 3012 2996 2980 2964 2948 2932
2916 2900 2884 2868 2852 2836 2820 2804 2788 2772 2756 2740 272
4 2708 2692 2676 2660 2644 2628 2612 2596 2580 2564 2548 2532 2516 2500 2484 2468 2452 2436 2420 2404 2388 2372 2356 2340 2324
2308 2292 2276 2260 2244 2228 2212 2196 2180 2164 2148 2132 211
6 2100 2084 2068 2052 2036 2020 2004 1988 1972 1956 1940 1924 1908 1892 1876 1860 1844 1828 1812 1796 1780 1764 1748 1732 1716
1700 1684 1668 1652 1636 1620 1604 1588 1572 1556 1540 1524 150
8 1492 1476 1460 1444 1428 1412 1396 1380 1364 1348 1332 1316 1300 1284 1268 1252 1236 1220 1204 1188 1172 1156 1140 1124 1108
1092 1076 1060 1044 1028 1012 996 980 964 948 932 916 900 884 8
68 852 836 820 804 788 772 756 740 724 708 692 676 660 644 628 612 596 580 564 548 532 516 500 484 468 452 436 420 404 388 372 356
340 324 308 292 276 260 244 228 212 196 180 164 148 132 116
100 84 68 52 36 20 4
Reading symbols from /boot/kernel/zfs.ko...Reading symbols from /boot/kernel/zfs.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/zfs.ko
Reading symbols from /boot/kernel/opensolaris.ko...Reading symbols from /boot/kernel/opensolaris.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/opensolaris.ko
Reading symbols from /boot/kernel/linprocfs.ko...Reading symbols from /boot/kernel/linprocfs.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/linprocfs.ko
Reading symbols from /boot/kernel/nullfs.ko...Reading symbols from /boot/kernel/nullfs.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/nullfs.ko
One of the machines has managed to dump where all the others
have failed to do so here's the stack from core.txt.0
#0 sched_switch (td=0xffffffff80830bc0, newtd=0xffffff000a73f8c0, flags=Variable "flags" is not available.)
at /usr/src/sys/kern/sched_ule.c:1858
1858 cpuid = PCPU_GET(cpuid);
(kgdb)
#0 sched_switch (td=0xffffffff80830bc0, newtd=0xffffff000a73f8c0, flags=Variable "flags" is not available.)
at /usr/src/sys/kern/sched_ule.c:1858
#1 0xffffffff80385c86 in mi_switch (flags=260, newtd=0x0)
at /usr/src/sys/kern/kern_synch.c:449
#2 0xffffffff803b92d2 in sleepq_timedwait (wchan=0xffffffff80830760, pri=68)
at /usr/src/sys/kern/subr_sleepqueue.c:644
#3 0xffffffff803861e1 in _sleep (ident=0xffffffff80830760, lock=0x0,
priority=Variable "priority" is not available.
) at /usr/src/sys/kern/kern_synch.c:230
#4 0xffffffff80532c29 in scheduler (dummy=Variable "dummy" is not available.
) at /usr/src/sys/vm/vm_glue.c:807
#5 0xffffffff80335d67 in mi_startup () at /usr/src/sys/kern/init_main.c:254
#6 0xffffffff8016efac in btext () at /usr/src/sys/amd64/amd64/locore.S:81
#7 0xffffffff808556e0 in sleepq_chains ()
#8 0xffffffff8083b1e0 in cpu_top ()
#9 0x0000000000000000 in ?? ()
#10 0xffffffff80830bc0 in proc0 ()
#11 0xffffffff80ba4b90 in ?? ()
#12 0xffffffff80ba4b38 in ?? ()
#13 0xffffff000a73f8c0 in ?? ()
#14 0xffffffff803a2cc9 in sched_switch (td=0x0, newtd=0x0, flags=Variable "flags" is not available.
)
at /usr/src/sys/kern/sched_ule.c:1852
Previous frame inner to this frame (corrupt stack?)
(kgdb)
Not sure this really points to the cause, but we have the crash dump so
can do more digging if someone would point me in the correct direction.
Regards
Steve
================================================
This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it.
In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337
or return the E.mail to postmaster at multiplay.co.uk.
More information about the freebsd-stable
mailing list