debugging frequent kernel panics on 8.2-RELEASE
Steven Hartland
killing at multiplay.co.uk
Mon Aug 15 10:45:16 UTC 2011
----- Original Message -----
From: "Andriy Gapon" <avg at FreeBSD.org>
>> We have 352 thread entries starting with:-
>> #0 sched_switch (td=0xffffffff8083e4e0, newtd=0xffffff0012d838c0,
>> flags=Variable "flags" is not available.
>> 23 with:-
>> cpustop_handler () at atomic.h:285
>> and 16 with:-
>> #0 fork_trampoline () at /usr/src/sys/amd64/amd64/exception.S:562
>
> I would like to get a full output of thread apply all bt.
http://blog.multplay.co.uk/dropzone/freebsd/panic-2011-08-14-1524.txt
>> The main message being:-
>> panic: double fault
>>
>> GNU gdb 6.1.1 [FreeBSD]
>> Copyright 2004 Free Software Foundation, Inc.
>> GDB is free software, covered by the GNU General Public License, and you are
>> welcome to change it and/or distribute copies of it under certain conditions.
>> Type "show copying" to see the conditions.
>> There is absolutely no warranty for GDB. Type "show warranty" for details.
>> This GDB was configured as "amd64-marcel-freebsd"...
>>
>> Unread portion of the kernel message buffer:
>> <118>Aug 14 15:13:33 amsbld15 syslogd: exiting on signal 15
>
> So this line, does it indicate a shutdown of a jail or of the whole system?
This specific panic was caused by me running "reboot" after all jails (~40)
where shutdown, which is slightly different from what my collegue was seeing
last friday, where the machines where panicing when the jails themselves
where stopped.
I may have a crash from one of these if needed.
>> Fatal double fault
>> rip = 0xffffffff8053b691
>
> Can you please provide output of 'list *0xffffffff8053b691' in kgdb?
(kgdb) list *0xffffffff8053b691
0xffffffff8053b691 is in vm_fault (/usr/src/sys/vm/vm_fault.c:239).
234 /*
235 * Find the backing store object and offset into it to begin the
236 * search.
237 */
238 fs.map = map;
239 result = vm_map_lookup(&fs.map, vaddr, fault_type, &fs.entry,
240 &fs.first_object, &fs.first_pindex, &prot, &wired);
241 if (result != KERN_SUCCESS) {
242 if (result != KERN_PROTECTION_FAILURE ||
243 (fault_flags & VM_FAULT_WIRE_MASK) != VM_FAULT_USER_WIRE) {
>
>> rsp = 0xffffff8d8f356fb0
>> rbp = 0xffffff8d8f357210
>> cpuid = 2; apic id = 02
>> panic: double fault
>> cpuid = 2
>> KDB: stack backtrace:
>> #0 0xffffffff803bb75e at kdb_backtrace+0x5e
>> #1 0xffffffff8038956e at panic+0x2ae
>> #2 0xffffffff805802b6 at dblfault_handler+0x96
>> #3 0xffffffff8056900d at Xdblfault+0xad
>
> I think (not 100% sure) that with DDB in kernel we could get a better backtrace
> here, possibly with pre-dblfault stack frames, because DDB backend is a bit more
> smarter than the trivial stack(9) printer.
I've added this into the the kernel on my test machine and will try
to get it panic over the next few days. Seems to need a few days on
uptime before the panics start happening. In addition to increasing
KSTACK_PAGES to 12, if you believe this may be stack exhaustion, do
you want me to remove this increase?
>> stack: 0xffffff8d8f357000, 4
>
> One thing I can say is that this looks like like a double-fault because of stack
> exhaustion (the most typical cause): rsp value is below td_kstack.
>
> Can you please also provide the following information:
> p *((struct pcb *)((char *)0xffffff8d8f357000 + KSTACK_PAGES * PAGE_SIZE) - 1)
> where KSTACK_PAGES is a value of KSTACK_PAGES option (amd64 default is 4) and
> PAGE_SIZE is 4096.
(kgdb) p *((struct pcb *)((char *)0xffffff8d8f357000 + 4 * 4096) - 1)
$1 = {pcb_r15 = -2138686968, pcb_r14 = -1070655224792, pcb_r13 = 0, pcb_r12 = -1070655225856, pcb_rbp = -491518580864, pcb_rsp
= -491518580952, pcb_rbx = -1099195460512, pcb_rip = -2143622375, pcb_fsbase = 34365428376,
pcb_gsbase = 0, pcb_kgsbase = 0, pcb_cr0 = 0, pcb_cr2 = 0, pcb_cr3 = 12406784, pcb_cr4 = 0, pcb_dr0 = 0, pcb_dr1 = 0, pcb_dr2 =
0, pcb_dr3 = 0, pcb_dr6 = 0, pcb_dr7 = 0, pcb_flags = 0, pcb_initial_fpucw = 895,
pcb_onfault = 0x0, pcb_gs32sd = {sd_lolimit = 0, sd_lobase = 0, sd_type = 0, sd_dpl = 0, sd_p = 0, sd_hilimit = 0, sd_xx = 0,
sd_long = 0, sd_def32 = 0, sd_gran = 0, sd_hibase = 0}, pcb_tssp = 0x0,
pcb_save = 0xffffff8d8f35ae00, pcb_full_iret = 0 '\0', pcb_gdt = {rd_limit = 0, rd_base = 0}, pcb_idt = {rd_limit = 0, rd_base =
0}, pcb_ldt = {rd_limit = 0, rd_base = 0}, pcb_tr = 0, pcb_user_save = {sv_env = {en_cw = 895,
en_sw = 0, en_tw = 0 '\0', en_zero = 0 '\0', en_opcode = 0, en_rip = 0, en_rdp = 0, en_mxcsr = 8096, en_mxcsr_mask = 65535},
sv_fp = {{fp_acc = {fp_bytes = "\000\000\000\000\000\000\000\000\000"},
fp_pad = "\000\000\000\000\000"}, {fp_acc = {fp_bytes = "\000\000\000\000\000\000\000\000\000"}, fp_pad =
"\000\000\000\000\000"}, {fp_acc = {fp_bytes = "\000\000\000\000\000\000\000\000\000"},
fp_pad = "\000\000\000\000\000"}, {fp_acc = {fp_bytes = "\000\000\000\000\000\000\000\000\000"}, fp_pad =
"\000\000\000\000\000"}, {fp_acc = {fp_bytes = "\000\000\000\000\000\000\000\000\000"},
fp_pad = "\000\000\000\000\000"}, {fp_acc = {fp_bytes = "\000\000\000\000\000\000\000\000\000"}, fp_pad =
"\000\000\000\000\000"}, {fp_acc = {fp_bytes = "\000\000\000\000\000\000\000\000\000"},
fp_pad = "\000\000\000\000\000"}, {fp_acc = {fp_bytes = "\000\000\000\000\000\000\000\000\000"}, fp_pad =
"\000\000\000\000\000"}}, sv_xmm = {{xmm_bytes = "\000\000\000\b\030\212rA\000\000\000\000\000\000\000"}, {
xmm_bytes = '\0' <repeats 15 times>} <repeats 15 times>}, sv_pad = '\0' <repeats 95 times>}}
Thanks for your help on this, as its way over my head ;-)
Regards
Steve
================================================
This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it.
In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337
or return the E.mail to postmaster at multiplay.co.uk.
More information about the freebsd-stable
mailing list