help with GPF on 5.4-STABLE

Fri May 20 16:52:05 PDT 2005

Doug,

Thanks for helping me look into this.

On May 20, 2005, at 1:59 PM, Doug White wrote:

> Lets prune this down:
>
> On Thu, 19 May 2005, Sean McNeil wrote:
>
>
>> I'm not sure what information to provide from my crash dump.  I  
>> tried to
>> burn a CD with my
>>
>> 'TOSHIBA ' 'CD/DVDW SD-R5372' 'TU31' Removable CD-ROM
>>
>> via. nautilus CD burner and I get a kernel panic:
>>
>> May 19 19:41:23 server kernel: Fatal trap 9: general protection  
>> fault while in kernel mode
>> May 19 19:41:23 server kernel: instruction pointer      =  
>> 0x8:0xffffffff801f4d99May 19 19:41:23 server kernel: stack  
>> pointer            = 0x10:0xffffffffb1d7ab80
>> May 19 19:41:23 server kernel: frame pointer            =  
>> 0x10:0xffffff0000c3b000
>> May 19 19:41:23 server kernel: code segment             = base  
>> 0x0, limit 0xfffff, type 0x1b
>> May 19 19:41:23 server kernel: = DPL 0, pres 1, long 1, def32 0,  
>> gran 1
>> May 19 19:41:23 server kernel: processor eflags = interrupt  
>> enabled, resume, IOPL = 0
>> May 19 19:41:23 server kernel: current process          = 5  
>> (thread taskq)
>> May 19 19:41:23 server kernel: trap number              = 9
>> May 19 19:41:23 server kernel: panic: general protection fault
>>
>> What can I do to get the proper info to the developers? using kgdb, I
>> checked the threads (pids) and stack.
>>

There appears to be a missing return on the lines above.  I think it  
caused you to read the SP for the IP.

> kern.timeout.c line 530 is
>
> 530         mtx_unlock_spin(&callout_lock);

I don't think this is the problem.  I think it is happening inside an  
interrupt handler while the thread was at this point.

> I'm not sure what in there would generate a GPF.  Load up a debugging
> version of the kernel that generated this error into gdb (add  
> "makeoptions
> DEBUG=-g" to your kernel config & rebuild if you don't have one,  
> and you
> don't need to load in the crashdump), and enter
>
> disass 0xffffffffb1d7ab80

Looking at 0xffffffff801f4d99 (as that is the IP and above is the  
SP), I see:

(gdb) l *0xffffffff801f4d99
0xffffffff801f4d99 is in ata_completed (/usr/src/sys/dev/ata/ata- 
queue.c:401).
396
397         ATA_DEBUG_RQ(request, "completed callback/wakeup");
398
399         /* get results back to the initiator */
400         if (request->callback)
401             (request->callback)(request);
402         else
403             sema_post(&request->done);
404
405         ata_start(ch);

0xffffffff801f4d87 <ata_completed+103>: mov    0x58(%rbx),%rax
0xffffffff801f4d8b <ata_completed+107>: test   %rax,%rax
0xffffffff801f4d8e <ata_completed+110>: data16
0xffffffff801f4d8f <ata_completed+111>: nop
0xffffffff801f4d90 <ata_completed+112>: je     0xffffffff801f4eb5  
<ata_completed+405>
0xffffffff801f4d96 <ata_completed+118>: mov    %rbx,%rdi
0xffffffff801f4d99 <ata_completed+121>: callq  *%eax

There is an eax register in 64-bit mode?  When I do an info reg in  
kgdb I don't see one.

> It'll disassemble whatever function it is in. Search the addresses  
> on the
> left for the matching line and paste it and a handful to both sides  
> into
> your reply.  That will help us narrow things down by seeing what
> instruction faulted and searching for conditions that cause that  
> fault.

It would appear that the atapicam layer is somehow setting (or not  
clearing) the request callback field of a structure.  Or, perhaps,  
there is a reference to the request structure that is happening after  
the atapicam layer thinks that it is finished and free'd the memory.

Does that sound reasonable?

Looking at the frame structure, it looks like rax == eax:

(kgdb) p/x frame
$2 = {tf_rdi = 0xffffff007abafe18, tf_rsi = 0x1, tf_rdx = 0x50,  
tf_rcx = 0x20,
   tf_r8 = 0xffffff007b7518b8, tf_r9 = 0xffffff007b75c2c0,
   tf_rax = 0x50070802106a0, tf_rbx = 0xffffff007abafe18,
   tf_rbp = 0xffffff0000c3b000, tf_r10 = 0xffffffff806c8c38, tf_r11 =  
0x0,
   tf_r12 = 0x4, tf_r13 = 0x1, tf_r14 = 0xffffff0000b54608, tf_r15 =  
0x1,
   tf_trapno = 0x9, tf_addr = 0x0, tf_flags = 0xffffffff8032355a,  
tf_err = 0x0,
   tf_rip = 0xffffffff801f4d99, tf_cs = 0x8, tf_rflags = 0x10206,
   tf_rsp = 0xffffffffb1d7ab90, tf_ss = 0x10}

which would make some sense. As tf_rax looks bogus.

What else can I do?

Thanks,
Sean