help with GPF on 5.4-STABLE

Sun May 22 13:27:01 PDT 2005

Answer is inline, but a ways down.

On Fri, 20 May 2005, Sean McNeil wrote:

> Doug,
>
> Thanks for helping me look into this.
>
> On May 20, 2005, at 1:59 PM, Doug White wrote:
>
> > Lets prune this down:
> >
> > On Thu, 19 May 2005, Sean McNeil wrote:
> >
> >
> >> I'm not sure what information to provide from my crash dump.  I
> >> tried to
> >> burn a CD with my
> >>
> >> 'TOSHIBA ' 'CD/DVDW SD-R5372' 'TU31' Removable CD-ROM
> >>
> >> via. nautilus CD burner and I get a kernel panic:
> >>
> >> May 19 19:41:23 server kernel: Fatal trap 9: general protection
> >> fault while in kernel mode
> >> May 19 19:41:23 server kernel: instruction pointer      =
> >> 0x8:0xffffffff801f4d99May 19 19:41:23 server kernel: stack
> >> pointer            = 0x10:0xffffffffb1d7ab80
> >> May 19 19:41:23 server kernel: frame pointer            =
> >> 0x10:0xffffff0000c3b000
> >> May 19 19:41:23 server kernel: code segment             = base
> >> 0x0, limit 0xfffff, type 0x1b
> >> May 19 19:41:23 server kernel: = DPL 0, pres 1, long 1, def32 0,
> >> gran 1
> >> May 19 19:41:23 server kernel: processor eflags = interrupt
> >> enabled, resume, IOPL = 0
> >> May 19 19:41:23 server kernel: current process          = 5
> >> (thread taskq)
> >> May 19 19:41:23 server kernel: trap number              = 9
> >> May 19 19:41:23 server kernel: panic: general protection fault
> >>
> >> What can I do to get the proper info to the developers? using kgdb, I
> >> checked the threads (pids) and stack.
> >>
>
> There appears to be a missing return on the lines above.  I think it
> caused you to read the SP for the IP.
>
> > kern.timeout.c line 530 is
> >
> > 530         mtx_unlock_spin(&callout_lock);
>
> I don't think this is the problem.  I think it is happening inside an
> interrupt handler while the thread was at this point.
>
> > I'm not sure what in there would generate a GPF.  Load up a debugging
> > version of the kernel that generated this error into gdb (add
> > "makeoptions
> > DEBUG=-g" to your kernel config & rebuild if you don't have one,
> > and you
> > don't need to load in the crashdump), and enter
> >
> > disass 0xffffffffb1d7ab80
>
> Looking at 0xffffffff801f4d99 (as that is the IP and above is the
> SP), I see:
>
> (gdb) l *0xffffffff801f4d99
> 0xffffffff801f4d99 is in ata_completed (/usr/src/sys/dev/ata/ata-
> queue.c:401).
> 396
> 397         ATA_DEBUG_RQ(request, "completed callback/wakeup");
> 398
> 399         /* get results back to the initiator */
> 400         if (request->callback)
> 401             (request->callback)(request);
> 402         else
> 403             sema_post(&request->done);
> 404
> 405         ata_start(ch);
>
> 0xffffffff801f4d87 <ata_completed+103>: mov    0x58(%rbx),%rax
> 0xffffffff801f4d8b <ata_completed+107>: test   %rax,%rax
> 0xffffffff801f4d8e <ata_completed+110>: data16
> 0xffffffff801f4d8f <ata_completed+111>: nop
> 0xffffffff801f4d90 <ata_completed+112>: je     0xffffffff801f4eb5
> <ata_completed+405>
> 0xffffffff801f4d96 <ata_completed+118>: mov    %rbx,%rdi
> 0xffffffff801f4d99 <ata_completed+121>: callq  *%eax
>
> There is an eax register in 64-bit mode?  When I do an info reg in
> kgdb I don't see one.

%r?x are the 64 bit versions of %e?x which are the 32 bit versions of %?x
:)

I think this is a peculiarity of long vs. normal mode and that the CALL
instruction prefix is the same for 32 and 64 bit quantities. Its entirely
possible gdb is just decoding the instruction wrong, assuming its 32 bit
and not 64.

Can you print the value of callback in the request there? I wonder if the
pointer is somehow mangled.

> > It'll disassemble whatever function it is in. Search the addresses
> > on the
> > left for the matching line and paste it and a handful to both sides
> > into
> > your reply.  That will help us narrow things down by seeing what
> > instruction faulted and searching for conditions that cause that
> > fault.
>
> It would appear that the atapicam layer is somehow setting (or not
> clearing) the request callback field of a structure.  Or, perhaps,
> there is a reference to the request structure that is happening after
> the atapicam layer thinks that it is finished and free'd the memory.
>
> Does that sound reasonable?
>
> Looking at the frame structure, it looks like rax == eax:
>
> (kgdb) p/x frame
> $2 = {tf_rdi = 0xffffff007abafe18, tf_rsi = 0x1, tf_rdx = 0x50,
> tf_rcx = 0x20,
>    tf_r8 = 0xffffff007b7518b8, tf_r9 = 0xffffff007b75c2c0,
>    tf_rax = 0x50070802106a0, tf_rbx = 0xffffff007abafe18,
>    tf_rbp = 0xffffff0000c3b000, tf_r10 = 0xffffffff806c8c38, tf_r11 =
> 0x0,
>    tf_r12 = 0x4, tf_r13 = 0x1, tf_r14 = 0xffffff0000b54608, tf_r15 =
> 0x1,
>    tf_trapno = 0x9, tf_addr = 0x0, tf_flags = 0xffffffff8032355a,
> tf_err = 0x0,
>    tf_rip = 0xffffffff801f4d99, tf_cs = 0x8, tf_rflags = 0x10206,
>    tf_rsp = 0xffffffffb1d7ab90, tf_ss = 0x10}
>
> which would make some sense. As tf_rax looks bogus.
>
> What else can I do?
>
> Thanks,
> Sean
>

-- 
Doug White                    |  FreeBSD: The Power to Serve
dwhite at gumbysoft.com          |  www.FreeBSD.org