10.0 BETA 3 with redports kernel panic
Sean Bruno
sean_bruno at yahoo.com
Fri Dec 20 15:47:47 UTC 2013
On Fri, 2013-12-20 at 11:48 +0200, Konstantin Belousov wrote:
> On Thu, Dec 19, 2013 at 02:35:41PM -0800, Sean Bruno wrote:
> > On Thu, 2013-12-19 at 11:20 -0800, Peter Wemm wrote:
> > > On Thu, Dec 19, 2013 at 10:59 AM, Peter Wemm <peter at wemm.org> wrote:
> > > > On Thu, Dec 19, 2013 at 10:51 AM, Sean Bruno <sean_bruno at yahoo.com> wrote:
> > > >> On Thu, 2013-12-19 at 20:08 +0200, Konstantin Belousov wrote:
> > > >>> On Thu, Dec 19, 2013 at 09:25:15AM -0800, Sean Bruno wrote:
> > > >>> > On Tue, 2013-12-17 at 05:04 -0800, Sean Bruno wrote:
> > > >>> > > On Tue, 2013-12-17 at 14:00 +0200, Konstantin Belousov wrote:
> > > >>> > > > On Mon, Dec 16, 2013 at 10:45:58AM -0800, Sean Bruno wrote:
> > > >>> > > > > On Mon, 2013-12-16 at 10:04 -0800, Sean Bruno wrote:
> > > >>> > > > > > > What is the source line for memrw+0x195 ?
> > > >>> > > > > >
> > > >>> > > > > > My apologies for the delay on this. Its been frustrating getting a
> > > >>> > > > > > crashdump on these machines due to their very large tmpfs usage.
> > > >>> > > > > > Currently, I am dumping a crash of 13+GB to a third HD that we had
> > > >>> > > > > > installed for this purpose.
> > > >>> > > > > >
> > > >>> > > > > > The machines are still running RC3 of 10.0r.
> > > >>> > > > > >
> > > >>> > > > > > I will attempt to get the requested information shortly.
> > > >>> > > > > >
> > > >>> > > > > > sean
> > > >>> > > > > >
> > > >>> > > > > >
> > > >>> > > > >
> > > >>> > > > > I've updated http://people.freebsd.org/~sbruno/redbuild_panic.txt
> > > >>> > > > >
> > > >>> > > > > It looks like its dying in uiomove() ?
> > > >>> > > >
> > > >>> > > > I believe I already posted the following patch, with no feedback.
> > > >>> > > >
> > > >>> > > > diff --git a/sys/amd64/amd64/mem.c b/sys/amd64/amd64/mem.c
> > > >>> > > > index abbbb21..e371499 100644
> > > >>> > > > --- a/sys/amd64/amd64/mem.c
> > > >>> > > > +++ b/sys/amd64/amd64/mem.c
> > > >>> > > > @@ -98,7 +98,11 @@ memrw(struct cdev *dev, struct uio *uio, int flags)
> > > >>> > > > kmemphys:
> > > >>> > > > o = v & PAGE_MASK;
> > > >>> > > > c = min(uio->uio_resid, (u_int)(PAGE_SIZE - o));
> > > >>> > > > - error = uiomove((void *)PHYS_TO_DMAP(v), (int)c, uio);
> > > >>> > > > + v = PHYS_TO_DMAP(v);
> > > >>> > > > + if (v < DMAP_MIN_ADDRESS || v >= DMAP_MAX_ADDRESS ||
> > > >>> > > > + pmap_kextract(v) == 0)
> > > >>> > > > + return (EFAULT);
> > > >>> > > > + error = uiomove((void *)v, (int)c, uio);
> > > >>> > > > continue;
> > > >>> > > > }
> > > >>> > > > else if (dev2unit(dev) == CDEV_MINOR_KMEM) {
> > > >>> > >
> > > >>> > > Will begin testing immediately
> > > >>> > >
> > > >>> > > sean
> > > >>> >
> > > >>> >
> > > >>> > Huh ... both machines panic'd this morning. It'll take 30 minutes or so
> > > >>> > to get a crash dump, but it looks like its still in the same place.
> > > >>> >
> > > >>> > db> whe
> > > >>> > Tracing pid 489 tid 101801 td 0xfffff80322946490
> > > >>> > kdb_enter() at kdb_enter+0x3e/frame 0xfffffe1839d26220
> > > >>> > panic() at panic+0x175/frame 0xfffffe1839d262a0
> > > >>> > vm_fault_hold() at vm_fault_hold+0x14ed/frame 0xfffffe1839d26500
> > > >>> > vm_fault() at vm_fault+0x77/frame 0xfffffe1839d26540
> > > >>> > trap_pfault() at trap_pfault+0x19b/frame 0xfffffe1839d265f0
> > > >>> > trap() at trap+0x5e6/frame 0xfffffe1839d26810
> > > >>> > calltrap() at calltrap+0x8/frame 0xfffffe1839d26810
> > > >>> > --- trap 0xc, rip = 0xffffffff80cae47b, rsp = 0xfffffe1839d268d0, rbp =
> > > >>> > 0xfffffe1839d26920 ---
> > > >>> > copyout() at copyout+0x3b/frame 0xfffffe1839d26920
> > > >>> > memrw() at memrw+0x1b6/frame 0xfffffe1839d26960
> > > >>> > giant_read() at giant_read+0x7a/frame 0xfffffe1839d269a0
> > > >>> > devfs_read_f() at devfs_read_f+0xea/frame 0xfffffe1839d26a00
> > > >>> > dofileread() at dofileread+0x7b/frame 0xfffffe1839d26a40
> > > >>> > kern_readv() at kern_readv+0x65/frame 0xfffffe1839d26a90
> > > >>> > sys_read() at sys_read+0x63/frame 0xfffffe1839d26ae0
> > > >>> > amd64_syscall() at amd64_syscall+0x357/frame 0xfffffe1839d26bf0
> > > >>> > Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe1839d26bf0
> > > >>> > --- syscall (3, FreeBSD ELF64, sys_read), rip = 0x800b750aa, rsp =
> > > >>> > 0x7fffffffd068, rbp = 0x7fffffffd0b0 ---
> > > >>> > db> call doadump
> > > >>> >
> > > >>>
> > > >>> I need to see exact panic and trap messages, as well as I need to know
> > > >>> the source line for memrw+0x1b6 in the patched kernel.
> > > >>
> > > >> Here is the panic/trap and the requested display. Peter suspects that
> > > >> part of the failure is the use of DMAP_MAX_ADDR and not dmaplimit in
> > > >> this and other comparisons. Patch attached that contains your
> > > >> modifications and his.
> > > >>
> > > >> bcc peter@
> > > >>
> > > >>
> > > >> panic: vm_fault: fault on nofault entry, addr: fffffe0327240000
> > > >> cpuid = 16
> > > >> KDB: stack backtrace:
> > > >> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
> > > >> 0xfffffe1839d26170
> > > >> kdb_backtrace() at kdb_backtrace+0x39/frame 0xfffffe1839d26220
> > > >> panic() at panic+0x155/frame 0xfffffe1839d262a0
> > > >> vm_fault_hold() at vm_fault_hold+0x14ed/frame 0xfffffe1839d26500
> > > >> vm_fault() at vm_fault+0x77/frame 0xfffffe1839d26540
> > > >> trap_pfault() at trap_pfault+0x19b/frame 0xfffffe1839d265f0
> > > >> trap() at trap+0x5e6/frame 0xfffffe1839d26810
> > > >> calltrap() at calltrap+0x8/frame 0xfffffe1839d26810
> > > >> --- trap 0xc, rip = 0xffffffff80cae47b, rsp = 0xfffffe1839d268d0, rbp =
> > > >> 0xfffffe1839d26920 ---
> > > >> copyout() at copyout+0x3b/frame 0xfffffe1839d26920
> > > >> memrw() at memrw+0x1b6/frame 0xfffffe1839d26960
> > > >> giant_read() at giant_read+0x7a/frame 0xfffffe1839d269a0
> > > >> devfs_read_f() at devfs_read_f+0xea/frame 0xfffffe1839d26a00
> > > >> dofileread() at dofileread+0x7b/frame 0xfffffe1839d26a40
> > > >> kern_readv() at kern_readv+0x65/frame 0xfffffe1839d26a90
> > > >> sys_read() at sys_read+0x63/frame 0xfffffe1839d26ae0
> > > >> amd64_syscall() at amd64_syscall+0x357/frame 0xfffffe1839d26bf0
> > > >> Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe1839d26bf0
> > > >> --- syscall (3, FreeBSD ELF64, sys_read), rip = 0x800b750aa, rsp =
> > > >> 0x7fffffffd068, rbp = 0x7fffffffd0b0 ---
> > > >> KDB: enter: panic
> > > >>
> > > >>
> > > >> (kgdb) whe
> > > >> #0 doadump (textdump=-2127435168) at pcpu.h:219
> > > >> #1 0xffffffff80342e25 in db_fncall (dummy1=<value optimized out>, dummy2=<value optimized out>, dummy3=<value optimized out>, dummy4=<value optimized out>)
> > > >> at /usr/src/sys/ddb/db_command.c:578
> > > >> #2 0xffffffff80342b0d in db_command (cmd_table=<value optimized out>) at /usr/src/sys/ddb/db_command.c:449
> > > >> #3 0xffffffff80342884 in db_command_loop () at /usr/src/sys/ddb/db_command.c:502
> > > >> #4 0xffffffff803451f0 in db_trap (type=<value optimized out>, code=0) at /usr/src/sys/ddb/db_main.c:231
> > > >> #5 0xffffffff808fad33 in kdb_trap (type=3, code=0, tf=<value optimized out>) at /usr/src/sys/kern/subr_kdb.c:656
> > > >> #6 0xffffffff80cb0277 in trap (frame=0xfffffe1839d26150) at /usr/src/sys/amd64/amd64/trap.c:579
> > > >> #7 0xffffffff80c96ef2 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:232
> > > >> #8 0xffffffff808fa4ee in kdb_enter (why=0xffffffff80f07ff2 "panic", msg=<value optimized out>) at cpufunc.h:63
> > > >> #9 0xffffffff808c1eb5 in panic (fmt=<value optimized out>) at /usr/src/sys/kern/kern_shutdown.c:747
> > > >> #10 0xffffffff80b299ed in vm_fault_hold (map=0xfffff80002000000, vaddr=<value optimized out>, fault_type=1 '\001', fault_flags=0, m_hold=0x0) at /usr/src/sys/vm/vm_fault.c:279
> > > >> #11 0xffffffff80b284b7 in vm_fault (map=0xfffff80002000000, vaddr=<value optimized out>, fault_type=1 '\001', fault_flags=0) at /usr/src/sys/vm/vm_fault.c:224
> > > >> #12 0xffffffff80cb08cb in trap_pfault (frame=0xfffffe1839d26820, usermode=0) at /usr/src/sys/amd64/amd64/trap.c:775
> > > >> #17 0xffffffff80c9e746 in memrw (dev=<value optimized out>, uio=<value optimized out>, flags=<value optimized out>) at /usr/src/sys/amd64/amd64/mem.c:105
> > > >> #18 0xffffffff8087323a in giant_read (dev=0xfffff80011302e00, uio=0xfffffe1839d26ab0, ioflag=0) at /usr/src/sys/kern/kern_conf.c:444
> > > >> #19 0xffffffff807b670a in devfs_read_f (fp=0xfffff80033711a50, uio=0xfffffe1839d26ab0, cred=<value optimized out>, flags=0, td=0xfffff80322946490)
> > > >> at /usr/src/sys/fs/devfs/devfs_vnops.c:1193
> > > >> #20 0xffffffff809117eb in dofileread (td=0xfffff80322946490, fd=4, fp=0xfffff80033711a50, auio=0xfffffe1839d26ab0, offset=<value optimized out>, flags=0) at file.h:295
> > > >> #21 0xffffffff80911525 in kern_readv (td=0xfffff80322946490, fd=4, auio=0xfffffe1839d26ab0) at /usr/src/sys/kern/sys_generic.c:256
> > > >> #22 0xffffffff809114b3 in sys_read (td=<value optimized out>, uap=<value optimized out>) at /usr/src/sys/kern/sys_generic.c:171
> > > >> #23 0xffffffff80cb1017 in amd64_syscall (td=0xfffff80322946490, traced=0) at subr_syscall.c:134
> > > >> #24 0xffffffff80c971db in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:391
> > > >> #25 0x0000000800b750aa in ?? ()
> > > >> Previous frame inner to this frame (corrupt stack?)
> > > >> Current language: auto; currently minimal
> > > >> (kgdb) p memrw+0x1b6
> > > >> $1 = (int (*)(struct cdev *, struct uio *, int)) 0xffffffff80c9e746 <memrw+438>
> > > >> (kgdb) f 17
> > > >> #17 0xffffffff80c9e746 in memrw (dev=<value optimized out>, uio=<value optimized out>, flags=<value optimized out>) at /usr/src/sys/amd64/amd64/mem.c:105
> > > >> 105 error = uiomove((void *)v, (int)c, uio);
> > > >> (kgdb) list
> > > >> 100 c = min(uio->uio_resid, (u_int)(PAGE_SIZE - o));
> > > >> 101 v = PHYS_TO_DMAP(v);
> > > >> 102 if (v < DMAP_MIN_ADDRESS || v >= DMAP_MAX_ADDRESS ||
> > > >> 103 pmap_kextract(v) == 0)
> > > >> 104 return (EFAULT);
> > > >> 105 error = uiomove((void *)v, (int)c, uio);
> > > >> 106 continue;
> > > >> 107 }
> > > >> 108 else if (dev2unit(dev) == CDEV_MINOR_KMEM) {
> > > >> 109 v = uio->uio_offset;
> > > >>
> > > >>
> > > >>
> > > >> Index: sys/amd64/amd64/mem.c
> > > >> ===================================================================
> > > >> --- sys/amd64/amd64/mem.c (revision 258554)
> > > >> +++ sys/amd64/amd64/mem.c (working copy)
> > > >> @@ -98,7 +98,11 @@
> > > >> kmemphys:
> > > >> o = v & PAGE_MASK;
> > > >> c = min(uio->uio_resid, (u_int)(PAGE_SIZE - o));
> > > >> - error = uiomove((void *)PHYS_TO_DMAP(v), (int)c,
> > > >> uio);
> > > >> + v = PHYS_TO_DMAP(v);
> > > >> + if (v < DMAP_MIN_ADDRESS || v >=
> > > >> DMAP_MAX_ADDRESS ||
> > > >> + pmap_kextract(v) == 0)
> > > >> + return (EFAULT);
> > > >> + error = uiomove((void *)v, (int)c, uio);
> > > >> continue;
> > > >> }
> > > >> else if (dev2unit(dev) == CDEV_MINOR_KMEM) {
> > > >> Index: sys/amd64/amd64/pmap.c
> > > >> ===================================================================
> > > >> --- sys/amd64/amd64/pmap.c (revision 258554)
> > > >> +++ sys/amd64/amd64/pmap.c (working copy)
> > > >> @@ -1870,7 +1870,7 @@
> > > >> pd_entry_t pde;
> > > >> vm_paddr_t pa;
> > > >>
> > > >> - if (va >= DMAP_MIN_ADDRESS && va < DMAP_MAX_ADDRESS) {
> > > >> + if (va >= DMAP_MIN_ADDRESS && va < dmaplimit) {
> > > >> pa = DMAP_TO_PHYS(va);
> > > >> } else {
> > > >> pde = *vtopde(va);
> > > >> @@ -3308,7 +3308,7 @@
> > > >> */
> > > >> if ((oldpde & PG_A) == 0 || (mpte = vm_page_alloc(NULL,
> > > >> pmap_pde_pindex(va), (va >= DMAP_MIN_ADDRESS && va <
> > > >> - DMAP_MAX_ADDRESS ? VM_ALLOC_INTERRUPT :
> > > >> VM_ALLOC_NORMAL) |
> > > >> + dmaplimit ? VM_ALLOC_INTERRUPT : VM_ALLOC_NORMAL) |
> > > >> VM_ALLOC_NOOBJ | VM_ALLOC_WIRED)) == NULL) {
> > > >> SLIST_INIT(&free);
> > > >> pmap_remove_pde(pmap, pde, trunc_2mpage(va),
> > > >> &free,
> > > >> @@ -6117,7 +6117,7 @@
> > > >> vm_offset_t base, offset;
> > > >>
> > > >> /* If we gave a direct map region in pmap_mapdev, do nothing */
> > > >> - if (va >= DMAP_MIN_ADDRESS && va < DMAP_MAX_ADDRESS)
> > > >> + if (va >= DMAP_MIN_ADDRESS && va < dmaplimit)
> > > >> return;
> > > >> base = trunc_page(va);
> > > >> offset = va & PAGE_MASK;
> > > >>
> > > >>
> > > >
> > > > Specifically, pmap_kextract(v) is nothing more than a repeat of the
> > > > if() that kib added to mem.c. pmap_kextract() doesn't check to see if
> > > > it is attempting to access beyond the end of the instantiated part of
> > > > the direct map region. pmap_kextract(invalid_address) returns a value
> > > > even between dmaplimit and DMAP_MAX_ADDRESS - and that'll lead to a
> > > > fault.
> > >
> > > The patch is wrong, as you found out. :)
> > >
> > > --
> > > Peter Wemm - peter at wemm.org; peter at FreeBSD.org; peter at yahoo-inc.com; KI6FJV
> > > Yes, I know, gmail sucks now. If you see this then I forgot. Habits
> > > are hard to break.
> >
> >
> >
> > Yah, ACPI does not like this in the slightest.
> What does ACPI not like ? The panic below does not look related.
> The patch only changed the code path for usermode access to /dev/mem
> and /dev/kmem.
>
> Anyway, at the end is the patch which should be better after Peter'
> diagnostic of the cause. Please give it a try. For me, kgdb /dev/mem
> still worked with the patch applied.
>
> >
> > KDB: debugger backends: ddb
> > KDB: current backend: ddb
> > kernel trap 12 with interrupts disabled
> >
> >
> > Fatal trap 12: page fault while in kernel mode
> > cpuid = 0; apic id = 00
> > fault virtual address = 0x378
> > fault code = supervisor read data, page not present
> > instruction pointer = 0x20:0xffffffff808a9b31
> > stack pointer = 0x28:0xffffffff81a90b50
> > frame pointer = 0x28:0xffffffff81a90bd0
> > code segment = base 0x0, limit 0xfffff, type 0x1b
> > = DPL 0, pres 1, long 1, def32 0, gran 1
> > processor eflags = resume, IOPL = 0
> > current process = 0 ()
> > [ thread pid 0 tid 0 ]
> > Stopped at __mtx_lock_sleep+0x1b1: movl 0x378(%rax),%ecx
> > db> bt
> > Tracing pid 0 tid 0 td 0xffffffff81527500
> > __mtx_lock_sleep() at __mtx_lock_sleep+0x1b1/frame 0xffffffff81a90bd0
> > vmem_xfree() at vmem_xfree+0x42/frame 0xffffffff81a90c10
> > acpi_find_table() at acpi_find_table+0x274/frame 0xffffffff81a90c60
> > madt_probe() at madt_probe+0x10/frame 0xffffffff81a90c70
> > apic_init() at apic_init+0x53/frame 0xffffffff81a90c90
> > mi_startup() at mi_startup+0x118/frame 0xffffffff81a90cb0
> > btext() at btext+0x2c
> >
>
> diff --git a/sys/amd64/amd64/mem.c b/sys/amd64/amd64/mem.c
> index abbbb21..e371499 100644
> --- a/sys/amd64/amd64/mem.c
> +++ b/sys/amd64/amd64/mem.c
> @@ -98,7 +98,11 @@ memrw(struct cdev *dev, struct uio *uio, int flags)
> kmemphys:
> o = v & PAGE_MASK;
> c = min(uio->uio_resid, (u_int)(PAGE_SIZE - o));
> - error = uiomove((void *)PHYS_TO_DMAP(v), (int)c, uio);
> + v = PHYS_TO_DMAP(v);
> + if (v < DMAP_MIN_ADDRESS || v >= DMAP_MAX_ADDRESS ||
> + pmap_kextract(v) == 0)
> + return (EFAULT);
> + error = uiomove((void *)v, (int)c, uio);
> continue;
> }
> else if (dev2unit(dev) == CDEV_MINOR_KMEM) {
> diff --git a/sys/amd64/amd64/pmap.c b/sys/amd64/amd64/pmap.c
> index 014020b..2569699 100644
> --- a/sys/amd64/amd64/pmap.c
> +++ b/sys/amd64/amd64/pmap.c
> @@ -1869,7 +1872,7 @@ pmap_kextract(vm_offset_t va)
> pd_entry_t pde;
> vm_paddr_t pa;
>
> - if (va >= DMAP_MIN_ADDRESS && va < DMAP_MAX_ADDRESS) {
> + if (va >= DMAP_MIN_ADDRESS && va < dmaplimit) {
> pa = DMAP_TO_PHYS(va);
> } else {
> pde = *vtopde(va);
With this change to pmap.c we blow up in keg_alloc_slab() now:
FreeBSD clang version 3.3 (tags/RELEASE_33/final 183502) 20130610
kernel trap 12 with interrupts disabled
Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address = 0x8
fault code = supervisor write data, page not present
instruction pointer = 0x20:0xffffffff80b2602a
stack pointer = 0x28:0xffffffff81a90a50
frame pointer = 0x28:0xffffffff81a90ac0
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = resume, IOPL = 0
current process = 0 ()
[ thread pid 0 tid 0 ]
Stopped at keg_alloc_slab+0x13a: movq %r13,0x8(%rax)
db> whe
Tracing pid 0 tid 0 td 0xffffffff81527500
keg_alloc_slab() at keg_alloc_slab+0x13a/frame 0xffffffff81a90ac0
keg_fetch_slab() at keg_fetch_slab+0x152/frame 0xffffffff81a90b10
zone_fetch_slab() at zone_fetch_slab+0x7e/frame 0xffffffff81a90b50
zone_import() at zone_import+0x3c/frame 0xffffffff81a90b90
uma_zalloc_arg() at uma_zalloc_arg+0x33e/frame 0xffffffff81a90c10
malloc() at malloc+0x6a/frame 0xffffffff81a90c60
init_dynamic_kenv() at init_dynamic_kenv+0x8d/frame 0xffffffff81a90c90
mi_startup() at mi_startup+0x118/frame 0xffffffff81a90cb0
btext() at btext+0x2c
db> bt
Tracing pid 0 tid 0 td 0xffffffff81527500
keg_alloc_slab() at keg_alloc_slab+0x13a/frame 0xffffffff81a90ac0
keg_fetch_slab() at keg_fetch_slab+0x152/frame 0xffffffff81a90b10
zone_fetch_slab() at zone_fetch_slab+0x7e/frame 0xffffffff81a90b50
zone_import() at zone_import+0x3c/frame 0xffffffff81a90b90
uma_zalloc_arg() at uma_zalloc_arg+0x33e/frame 0xffffffff81a90c10
malloc() at malloc+0x6a/frame 0xffffffff81a90c60
init_dynamic_kenv() at init_dynamic_kenv+0x8d/frame 0xffffffff81a90c90
mi_startup() at mi_startup+0x118/frame 0xffffffff81a90cb0
btext() at btext+0x2c
More information about the freebsd-stable
mailing list