debugging frequent kernel panics on 8.2-RELEASE

Steven Hartland killing at multiplay.co.uk
Mon Aug 15 16:14:29 UTC 2011


----- Original Message ----- 
From: "Andriy Gapon" <avg at FreeBSD.org>
To: "Steven Hartland" <killing at multiplay.co.uk>
Cc: <freebsd-stable at FreeBSD.org>
Sent: Monday, August 15, 2011 4:36 PM
Subject: Re: debugging frequent kernel panics on 8.2-RELEASE


> on 15/08/2011 17:56 Steven Hartland said the following:
>> 
>> ----- Original Message ----- From: "Andriy Gapon" <avg at FreeBSD.org>
>> To: "Steven Hartland" <killing at multiplay.co.uk>
>> Cc: <freebsd-stable at FreeBSD.org>
>> Sent: Monday, August 15, 2011 2:20 PM
>> Subject: Re: debugging frequent kernel panics on 8.2-RELEASE
>> 
>> 
>>> on 15/08/2011 15:51 Steven Hartland said the following:
>>>> ----- Original Message ----- From: "Andriy Gapon" <avg at FreeBSD.org>
>>>>
>>>>
>>>>> on 15/08/2011 13:34 Steven Hartland said the following:
>>>>>> (kgdb) list *0xffffffff8053b691
>>>>>> 0xffffffff8053b691 is in vm_fault (/usr/src/sys/vm/vm_fault.c:239).
>>>>>> 234             /*
>>>>>> 235              * Find the backing store object and offset into it to begin the
>>>>>> 236              * search.
>>>>>> 237              */
>>>>>> 238             fs.map = map;
>>>>>> 239             result = vm_map_lookup(&fs.map, vaddr, fault_type, &fs.entry,
>>>>>> 240                 &fs.first_object, &fs.first_pindex, &prot, &wired);
>>>>>> 241             if (result != KERN_SUCCESS) {
>>>>>> 242                     if (result != KERN_PROTECTION_FAILURE ||
>>>>>> 243                         (fault_flags & VM_FAULT_WIRE_MASK) !=
>>>>>> VM_FAULT_USER_WIRE) {
>>>>>>
>>>>>
>>>>> Interesting... thanks!
> [snip]
>> (kgdb) x/512a 0xffffff8d8f357210
> 
> This is not conclusive, but that stack looks like the following recursive chain:
> vm_fault -> {vm_map_lookup, vm_map_growstack} -> trap -> trap_pfault -> vm_fault
> So I suspect that increasing kernel stack size won't help here much.
> Where does this chain come from?  I have no answer at the moment, maybe other
> developers could help here.  I suspect that we shouldn't be getting that trap in
> vm_map_growstack or should handle it in a different way.
> 

Just in case its relevant I've checked other crashes and all rip entries
point to: vm_fault (/usr/src/sys/vm/vm_fault.c:239).

A more typical layout is from a selection of machines is:-

Unread portion of the kernel message buffer:

Fatal double fault
rip = 0xffffffff8053b061
rsp = 0xffffff86ccf8ffb0
rbp = 0xffffff86ccf90210
cpuid = 8; apic id = 10
panic: double fault
cpuid = 8
KDB: stack backtrace:
#0 0xffffffff803bb28e at kdb_backtrace+0x5e
#1 0xffffffff80389187 at panic+0x187
#2 0xffffffff8057fc86 at dblfault_handler+0x96
#3 0xffffffff805689dd at Xdblfault+0xad
Uptime: 2d21h25m4s
Physical memory: 24555 MB
Dumping 4184 MB:...
----

Unread portion of the kernel message buffer:

Fatal double fault
rip = 0xffffffff8053b061
rsp = 0xffffff86cc742fb0
rbp = 0xffffff86cc743210
cpuid = 8; apic id = 10
panic: double fault
cpuid = 8
KDB: stack backtrace:
#0 0xffffffff803bb28e at kdb_backtrace+0x5e
#1 0xffffffff80389187 at panic+0x187
#2 0xffffffff8057fc86 at dblfault_handler+0x96
#3 0xffffffff805689dd at Xdblfault+0xad
Uptime: 2d4h30m58s
Physical memory: 24555 MB
Dumping 5088 MB:...
----

Fatal double fault
rip = 0xffffffff8053b061
rsp = 0xffffff86caeabfb0
rbp = 0xffffff86caeac210
cpuid = 8; apic id = 10
panic: double fault
cpuid = 8
KDB: stack backtrace:
#0 0xffffffff803bb28e at kdb_backtrace+0x5e
#1 0xffffffff80389187 at panic+0x187
#2 0xffffffff8057fc86 at dblfault_handler+0x96
#3 0xffffffff805689dd at Xdblfault+0xad
Uptime: 3d1h56m45s
Physical memory: 24555 MB
Dumping 4690 MB:...
----

Fatal double fault
rip = 0xffffffff8053b061
rsp = 0xffffff86cb1c7fb0
rbp = 0xffffff86cb1c8210
cpuid = 4; apic id = 04
panic: double fault
cpuid = 4
KDB: stack backtrace:
#0 0xffffffff803bb28e at kdb_backtrace+0x5e
#1 0xffffffff80389187 at panic+0x187
#2 0xffffffff8057fc86 at dblfault_handler+0x96
#3 0xffffffff805689dd at Xdblfault+0xad
Uptime: 1d13h41m19s
Physical memory: 24555 MB
Dumping 3626 MB:...

And in case any of the changes to loader.conf or sysctl.conf are
relevant here they are:-
[loader.conf]
zfs_load="YES"
vfs.root.mountfrom="zfs:tank/root"
# fix swap zone exhausted, increase kern.maxswzone
kern.maxswzone=67108864
# Reduce the minimum arc level we want our apps to have the memory
vfs.zfs.arc_min="512M"
[/loader.conf]

[sysctl.conf]
vfs.read_max=32
net.inet.tcp.inflight.enable=0
net.inet.tcp.sendspace=65536
kern.ipc.maxsockbuf=524288
kern.maxfiles=50000
kern.ipc.nmbclusters=51200
[/sysctl.conf]

    Regards
    Steve

================================================
This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 

In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337
or return the E.mail to postmaster at multiplay.co.uk.



More information about the freebsd-stable mailing list