T41 CDRW page fault saga

Bruce M Simpson bms at spc.org
Sat Jul 10 11:22:40 PDT 2004


On Fri, Jul 09, 2004 at 02:31:52AM +0100, Bruce M Simpson wrote:
> If we can establish that the problem is isolated to a specific ATA
> controller revision, we may be getting somewhere....

I've got more data from the local user's affected machine. We had to
manually transcribe the messages as I don't have enough firewire kit
around to do dcons.

This is the kernel I'm using:
FreeBSD empiric.dek.spc.org 5.2-CURRENT FreeBSD 5.2-CURRENT #1: Tue Jul  6 23:17:47 BST 2004     bms at kimchi.dek.spc.org:/usr/src/sys/i386/compile/EMPIRIC  i386

There isn't a panic per se. The page fault only manifests itself on the
affected T41 when the CDRW module is inserted; if it's removed during boot,
all is well.

We managed to pull a backtrace. It's clear this happens only during
mountroot and it could be a trashed stack. The addresses, of course,
are specific to my production -CURRENT kernel (I usually build kernel.debug),

I couldn't get a panic (it kept complaining of not having enough room
on my dumpdev, although I know for a fact I have enough blocks to cover
physical memory which is 512MB on this box).

This message occurs immediately after mountroot is attempted (it finds
the root filesystem correctly) and after the ATAPI_IDENTIFY messages which
others have reported (inspection of the ata driver suggests these messages
are benign, but green@ has since posted patches which address the
'device atapicam' case):

---8<---8<---
Fatal trap 12: page fault while in kernel mode
fault virtual address = 0x1ff01ff
fault code            = supervisor read, page not present
instruction pointer   = 0x08:0x1ff01ff
stack pointer         = 0x10:0xd3e9cb30
frame pointer         = 0x10:0xd3e9cb54
code segment          = base 0x0, limit 0xfffff, type 0x1b
                      = DPL 0, pres 1, def32 1, gran 1
processor eflags      = interrupt enabled, resume, IOPL = 0
current process       = 1 (swapper)
kernel: type 12 trap, code = 0
Stopped at   0x1ff01ff
---8<---8<---

(On entry into DDB: eip = 0xc05d4808, esp = 0xd3e9c97c, fp = 0xd3e9c980)

We managed to get a backtrace using "show thr" as follows (we didn't
transcribe the stack parameters, just the backtrace):-

---8<---8<---
kernload at 0x1ff01ff
devfs_allocv at devfs_allocv+0x13c
devfs_root at devfs_root+0x23
devfs_nmount at devfs_nmount+0xaf
getdiskbyname at getdiskbyname+0xb1
setrootbyname at setrootbyname+0xb
vfs_mountroot_try at vfs_mountroot_try+0xcf
vfs_mountroot at vfs_mountroot+0x6b
start_init at start_init+0x53
fork_exit
fork_trampoline
---8<---8<---

I'll try to pin down the exact opcode/line in devfs_allocv() where the
call stack appears to be getting to screwed up.

Hopefully this helps continuing efforts to debug this problem.

Regards,
BMS


More information about the freebsd-mobile mailing list