help with GPF on 5.4-STABLE
Sean McNeil
sean at mcneil.com
Fri May 20 16:52:05 PDT 2005
Doug,
Thanks for helping me look into this.
On May 20, 2005, at 1:59 PM, Doug White wrote:
> Lets prune this down:
>
> On Thu, 19 May 2005, Sean McNeil wrote:
>
>
>> I'm not sure what information to provide from my crash dump. I
>> tried to
>> burn a CD with my
>>
>> 'TOSHIBA ' 'CD/DVDW SD-R5372' 'TU31' Removable CD-ROM
>>
>> via. nautilus CD burner and I get a kernel panic:
>>
>> May 19 19:41:23 server kernel: Fatal trap 9: general protection
>> fault while in kernel mode
>> May 19 19:41:23 server kernel: instruction pointer =
>> 0x8:0xffffffff801f4d99May 19 19:41:23 server kernel: stack
>> pointer = 0x10:0xffffffffb1d7ab80
>> May 19 19:41:23 server kernel: frame pointer =
>> 0x10:0xffffff0000c3b000
>> May 19 19:41:23 server kernel: code segment = base
>> 0x0, limit 0xfffff, type 0x1b
>> May 19 19:41:23 server kernel: = DPL 0, pres 1, long 1, def32 0,
>> gran 1
>> May 19 19:41:23 server kernel: processor eflags = interrupt
>> enabled, resume, IOPL = 0
>> May 19 19:41:23 server kernel: current process = 5
>> (thread taskq)
>> May 19 19:41:23 server kernel: trap number = 9
>> May 19 19:41:23 server kernel: panic: general protection fault
>>
>> What can I do to get the proper info to the developers? using kgdb, I
>> checked the threads (pids) and stack.
>>
There appears to be a missing return on the lines above. I think it
caused you to read the SP for the IP.
> kern.timeout.c line 530 is
>
> 530 mtx_unlock_spin(&callout_lock);
I don't think this is the problem. I think it is happening inside an
interrupt handler while the thread was at this point.
> I'm not sure what in there would generate a GPF. Load up a debugging
> version of the kernel that generated this error into gdb (add
> "makeoptions
> DEBUG=-g" to your kernel config & rebuild if you don't have one,
> and you
> don't need to load in the crashdump), and enter
>
> disass 0xffffffffb1d7ab80
Looking at 0xffffffff801f4d99 (as that is the IP and above is the
SP), I see:
(gdb) l *0xffffffff801f4d99
0xffffffff801f4d99 is in ata_completed (/usr/src/sys/dev/ata/ata-
queue.c:401).
396
397 ATA_DEBUG_RQ(request, "completed callback/wakeup");
398
399 /* get results back to the initiator */
400 if (request->callback)
401 (request->callback)(request);
402 else
403 sema_post(&request->done);
404
405 ata_start(ch);
0xffffffff801f4d87 <ata_completed+103>: mov 0x58(%rbx),%rax
0xffffffff801f4d8b <ata_completed+107>: test %rax,%rax
0xffffffff801f4d8e <ata_completed+110>: data16
0xffffffff801f4d8f <ata_completed+111>: nop
0xffffffff801f4d90 <ata_completed+112>: je 0xffffffff801f4eb5
<ata_completed+405>
0xffffffff801f4d96 <ata_completed+118>: mov %rbx,%rdi
0xffffffff801f4d99 <ata_completed+121>: callq *%eax
There is an eax register in 64-bit mode? When I do an info reg in
kgdb I don't see one.
> It'll disassemble whatever function it is in. Search the addresses
> on the
> left for the matching line and paste it and a handful to both sides
> into
> your reply. That will help us narrow things down by seeing what
> instruction faulted and searching for conditions that cause that
> fault.
It would appear that the atapicam layer is somehow setting (or not
clearing) the request callback field of a structure. Or, perhaps,
there is a reference to the request structure that is happening after
the atapicam layer thinks that it is finished and free'd the memory.
Does that sound reasonable?
Looking at the frame structure, it looks like rax == eax:
(kgdb) p/x frame
$2 = {tf_rdi = 0xffffff007abafe18, tf_rsi = 0x1, tf_rdx = 0x50,
tf_rcx = 0x20,
tf_r8 = 0xffffff007b7518b8, tf_r9 = 0xffffff007b75c2c0,
tf_rax = 0x50070802106a0, tf_rbx = 0xffffff007abafe18,
tf_rbp = 0xffffff0000c3b000, tf_r10 = 0xffffffff806c8c38, tf_r11 =
0x0,
tf_r12 = 0x4, tf_r13 = 0x1, tf_r14 = 0xffffff0000b54608, tf_r15 =
0x1,
tf_trapno = 0x9, tf_addr = 0x0, tf_flags = 0xffffffff8032355a,
tf_err = 0x0,
tf_rip = 0xffffffff801f4d99, tf_cs = 0x8, tf_rflags = 0x10206,
tf_rsp = 0xffffffffb1d7ab90, tf_ss = 0x10}
which would make some sense. As tf_rax looks bogus.
What else can I do?
Thanks,
Sean
More information about the freebsd-amd64
mailing list