Fatal trap 12: page fault on Acer Chromebook 720 (peppy)
Michael Gmelin
freebsd at grem.de
Tue Jun 5 23:06:29 UTC 2018
On Tue, 5 Jun 2018 16:11:35 +0300
Konstantin Belousov <kostikbel at gmail.com> wrote:
> On Mon, Jun 04, 2018 at 11:17:56PM +0200, Michael Gmelin wrote:
> >
> >
> > On Mon, 4 Jun 2018 14:06:55 +0300
> > Konstantin Belousov <kostikbel at gmail.com> wrote:
> >
> > > On Mon, Jun 04, 2018 at 12:46:32AM +0200, Michael Gmelin wrote:
> > > >
> > > >
> > > > On Sun, 3 Jun 2018 23:53:40 +0300
> > > > Konstantin Belousov <kostikbel at gmail.com> wrote:
> > > >
> > > > > On Sun, Jun 03, 2018 at 09:50:20PM +0200, Michael Gmelin
> > > > > wrote:
> > > > > >
> > > > > >
> > > > > > On Sun, 3 Jun 2018 18:04:23 +0300
> > > > > > Konstantin Belousov <kostikbel at gmail.com> wrote:
> > > > > >
> > > > > > > On Sun, Jun 03, 2018 at 04:55:00PM +0200, Michael Gmelin
> > > > > > > wrote:
> > > > > > > >
> > > > > > > >
> > > > > > > > On Sun, 3 Jun 2018 16:21:10 +0300
> > > > > > > > Konstantin Belousov <kostikbel at gmail.com> wrote:
> > > > > > > >
> > > > > > > > > On Sun, Jun 03, 2018 at 02:48:40PM +0200, Michael
> > > > > > > > > Gmelin wrote:
> > > > > > > > > > Hi,
> > > > > > > > > >
> > > > > > > > > > After upgrading CURRENT to r333992 (from something
> > > > > > > > > > at least a year old, quite some changes in
> > > > > > > > > > mp_machdep.c since), this machine crashes on boot:
> > > > > > > > > >
> > > > > > > > > > Copyright (c) 1992-2018 The FreeBSD Project.
> > > > > > > > > > Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989,
> > > > > > > > > > 1991, 1992, 1993, 1994 The Regents of the
> > > > > > > > > > University of California. All rights reserved.
> > > > > > > > > > FreeBSD is a registered trademark of The FreeBSD
> > > > > > > > > > Foundation. FreeBSD 12.0-CURRENT #1 r333992: Tue
> > > > > > > > > > May 22 00:31:04 CEST 2018
> > > > > > > > > > root at flimsy:/usr/obj/usr/src/amd64.amd64/sys/flimsy
> > > > > > > > > > amd64 FreeBSD clang version 6.0.0
> > > > > > > > > > (tags/RELEASE_600/final 326565) (based on LLVM
> > > > > > > > > > 6.0.0) WARNING: WITNESS option enabled, expect
> > > > > > > > > > reduced performance. VT(vga): resolution 640x480
> > > > > > > > > > CPU: Intel(R) Celeron(R) 2955U @ 1.40GHz
> > > > > > > > > > (1396.80-MHz K8-class CPU) Origin="GenuineIntel"
> > > > > > > > > > Id=0x40651 Family=0x6 Model=0x45 Stepping=1
> > > > > > > > > > Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,
> > > > > > > > > > CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
> > > > > > > > > > Features2=0x4ddaebbf<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,EST,TM2,SSSE3,SDBG,CX16,
> > > > > > > > > > xTPR,PDCM,PCID,SSE4.1,SSE4.2,MOVBE,POPCNT,TSCDLT,XSAVE,OSXSAVE,RDRAND>
> > > > > > > > > > AMD
> > > > > > > > > > Features=0x2c100800<SYSCALL,NX,Page1GB,RDTSCP,LM>
> > > > > > > > > > AMD Features2=0x21<LAHF,ABM> Structured Extended
> > > > > > > > > > Features=0x2603<FSGSBASE,TSCADJ,ERMS,INVPCID,NFPUSG>
> > > > > > > > > > XSAVE Features=0x1<XSAVEOPT> VT-x: (disabled in
> > > > > > > > > > BIOS) PAT,HLT,MTF,PAUSE,EPT,UG,VPID TSC: P-state
> > > > > > > > > > invariant, performance statistics real memory =
> > > > > > > > > > 4301258752 (4102 MB) avail memory = 1907572736
> > > > > > > > > > (1819 MB) Event timer "LAPIC" quality 600 ACPI APIC
> > > > > > > > > > Table: <CORE
> > > > > > > > > > COREBOOT>
> > > > > > > > > What does this mean ? Did you flashed
> > > > > > > > > coreboot ?
> > > > > > > >
> > > > > > > > This machine comes with it by default (my model was
> > > > > > > > delivered with SeaBIOS 20131018_145217-build121-m2). So
> > > > > > > > I didn't flash anything (didn't feel like bricking it).
> > > > > > > >
> > > > > > > > >
> > > > > > > > > > kernel trap 12 with interrupts disabled
> > > > > > > > > >
> > > > > > > > > > Fatal trap 12: page fault while in kernel mode
> > > > > > > > > > cpuid = 0; apic id = 00
> > > > > > > > > > fault virtual address = 0xfffff80001000000
> > > > > > > > > > fault code = supervisor write data,
> > > > > > > > > > protection violation instruction pointer =
> > > > > > > > > > 0x20:Oxffffffff8102955f stack pointer =
> > > > > > > > > > 0x28:0xffffffff82a79be0 frame pointer =
> > > > > > > > > > 0x28:0xffffffff82a79c10 code segment =
> > > > > > > > > > base Ox0, limit Oxfffff, type Ox1b = DPL 0, pres 1,
> > > > > > > > > > long 1, def32 0, gran 1 processor eflags =
> > > > > > > > > > resume, IOPL = 0 current process = 0 ()
> > > > > > > > > > [ thread pid 0 tid 0 ]
> > > > > > > > > > Stopped at native_start_all_aps+0x08f:
> > > > > > > > > > movq %rax,(%rsi)
> > > > > > > > > Look up the source line number for this address.
> > > > > > > > >
> > > > > > > >
> > > > > > > > I guess that's sys/amd64/amd64/support.S line 854 (in
> > > > > > > > rdmsr), called by native_start_all_aps. Any additional
> > > > > > > > hints how I can track it down?
> > > > > > > Why did you decided that this is rdmsr_safe() ? First,
> > > > > > > native_start_all_aps() does not call rdmsr, second the ddb
> > > > > > > report clearly indicates that the fault occured acessing
> > > > > > > DMAP in native_start_all_aps().
> > > > > > >
> > > > > > > Just look up the source line by the address
> > > > > > > native_start_all_aps+0x08f.
> > > > > >
> > > > > > Okay, according to kgbd this should be here:
> > > > > >
> > > > > > https://svnweb.freebsd.org/base/head/sys/amd64/amd64/mp_machdep.c?revision=333368&view=markup#l369
> > > > > >
> > > > > > 364
> > > > > > 365 /* Create the initial 1GB replicated page tables */
> > > > > > 366 for (i = 0; i < 512; i++) {
> > > > > > 367 /* Each slot of the level 4 pages points to
> > > > > > the same level 3 page */ 368 pt4[i] =
> > > > > > (u_int64_t)(uintptr_t)(mptramp_pagetables + PAGE_SIZE); 369
> > > > > > pt4[i] |= PG_V | PG_RW | PG_U; 370
> > > > > > 371 /* Each slot of the level 3 pages points to
> > > > > > the same level 2 page */ 372 pt3[i] =
> > > > > > (u_int64_t)(uintptr_t)(mptramp_pagetables + (2 *
> > > > > > PAGE_SIZE)); 373 pt3[i] |= PG_V | PG_RW | PG_U;
> > > > > > 374 375 /* The level 2 page slots are mapped
> > > > > > with 2MB pages for 1GB. */ 376 pt2[i] = i * (2 *
> > > > > > 1024 * 1024); 377 pt2[i] |= PG_V | PG_RW | PG_PS
> > > > > > | PG_U; 378 }
> > > > > >
> > > > > > -m
> > > > > You have fault on write due to read-only mapping of the
> > > > > portion of the direct map, which maps the kernel text. It is
> > > > > consistent with the faulting address. It is not clear if it
> > > > > is something new on your machine, or before the kernel text
> > > > > was silently corrupted, since ro protection is somewhat
> > > > > recent.
> > > > >
> > > > > It seems that mp_bootaddress() selected the bad place for the
> > > > > bootstrap page tables. Even more, we do not include the kernel
> > > > > text into the physmem[] array, so it is not clear how did it
> > > > > happen. This code was also changed recently.
> > > > >
> > > > > Can you add the print of the physmap[] array somewhere before
> > > > > the panic, to see what is the kernel idea of the available
> > > > > memory ? It should be already done if you have serial console
> > > > > and set debug.late_console tunable to 0.
> > > >
> > > > This is a sad little machine without any kind of serial console.
> > > >
> > > > Physmap looks like this after calling getmemsize():
> > > >
> > > > [0]: 0x10000
> > > > [1]: 0x30000
> > > > [2]: 0x40000
> > > > [3]: 0x9e000
> > > > [4]: 0x100000
> > > > [5]: 0xf00000
> > > > [6]: 0x1003000
> > > > [7]: 0x7bf7a000
> > > >
> > > > Physical memory chunks logged in cpu_startup are:
> > > >
> > > > 0x0000000000010000 - 0x000000000002ffff, 141072 bytes (32 pages)
> > > > 0x0000000000040000 - 0x000000000009dfff, 385024 bytes (94
> > > > pages)
> > > These two chunks reports are consistent with the physmap[0-1,
> > > 2-3].
> > > > 0x0000000000100000 - 0x00000000001fffff, 1048576 bytes (256
> > > > pages) 0x0000000002c00000 - 0x0000000075467fff, 1921417216
> > > > bytes (469096 pages) 0x0000000100000000 - 0x00000001005e7fff,
> > > > 6193152 bytes (1512 pages)
> > > But these three looks completely unrelated to the rest of the
> > > physmap, perhaps except the physmap[4]. We allocate boot pages
> > > from the top of the last physmap chunk, but I am certain that we
> > > do not consume that much memory for boot to make physmap[7] from
> > > the last reported address.
> > >
> > > Are you sure that there are no typos in the values above ?
> >
> > Double checked the numbers. I changed it a bit more,
> > so that debug output appears all on one page. Please see here for
> > the results:
> >
> > https://gist.github.com/grembo/cebb9f7e2a98c37a51bee1e508f7d890
> Ok, I have a guess what is going on. Does the result of the quirks
> end up as hw.physmem tunable passed to kernel ? It seems that there
> is physmap[] element pointing outside the DMAP-mapped region.
>
> Perhaps print the dmap limit too, to see whether I am on the right
> track.
I didn't print the dmap limit yet, but I tested your patch:
>
> Try the following change. It lacks i386 bits.
>
> diff --git a/sys/amd64/amd64/machdep.c b/sys/amd64/amd64/machdep.c
> index e5c69ed91fa..bd6bbf04006 100644
> --- a/sys/amd64/amd64/machdep.c
> +++ b/sys/amd64/amd64/machdep.c
> @@ -1254,7 +1254,7 @@ getmemsize(caddr_t kmdp, u_int64_t first)
> * in real mode mode (e.g. SMP bare metal).
> */
> if (init_ops.mp_bootaddress)
> - init_ops.mp_bootaddress(physmap, &physmap_idx);
> + init_ops.mp_bootaddress(physmap, &physmap_idx,
> first);
> /*
> * Maxmem isn't the "maximum memory", it's one larger than
> the diff --git a/sys/amd64/amd64/mp_machdep.c
> b/sys/amd64/amd64/mp_machdep.c index 30146142087..292a6cefa91 100644
> --- a/sys/amd64/amd64/mp_machdep.c
> +++ b/sys/amd64/amd64/mp_machdep.c
> @@ -103,7 +103,8 @@ static int start_ap(int apic_id);
> * Calculate usable address in base memory for AP trampoline code.
> */
> void
> -mp_bootaddress(vm_paddr_t *physmap, unsigned int *physmap_idx)
> +mp_bootaddress(vm_paddr_t *physmap, unsigned int *physmap_idx,
> + vm_paddr_t dmap_limit)
> {
> unsigned int i;
> bool allocated;
> @@ -117,8 +118,9 @@ mp_bootaddress(vm_paddr_t *physmap, unsigned int
> *physmap_idx)
> * store the initial page tables. Note that it needs
> to be
> * aligned to a page boundary.
> */
> - if (physmap[i] >= GiB(4) ||
> - (physmap[i + 1] - round_page(physmap[i])) <
> (PAGE_SIZE * 3))
> + if (physmap[i] >= GiB(4) || physmap[i + 1] -
> + round_page(physmap[i]) < PAGE_SIZE * 3 ||
> + physmap[i + 1] - PAGE_SIZE * 3 > dmap_limit)
> continue;
>
> allocated = true;
> diff --git a/sys/amd64/include/smp.h b/sys/amd64/include/smp.h
> index 2ecfe62cf9f..24f0580fe51 100644
> --- a/sys/amd64/include/smp.h
> +++ b/sys/amd64/include/smp.h
> @@ -58,7 +58,7 @@ void invlpg_pcid_handler(void);
> void invlrng_invpcid_handler(void);
> void invlrng_pcid_handler(void);
> int native_start_all_aps(void);
> -void mp_bootaddress(vm_paddr_t *, unsigned int *);
> +void mp_bootaddress(vm_paddr_t *, unsigned int *, vm_paddr_t);
>
> #endif /* !LOCORE */
> #endif /* SMP */
> diff --git a/sys/x86/include/init.h b/sys/x86/include/init.h
> index 880cabaa949..58bbe0a5fd6 100644
> --- a/sys/x86/include/init.h
> +++ b/sys/x86/include/init.h
> @@ -41,7 +41,7 @@ struct init_ops {
> void (*early_clock_source_init)(void);
> void (*early_delay)(int);
> void (*parse_memmap)(caddr_t, vm_paddr_t *, int *);
> - void (*mp_bootaddress)(vm_paddr_t *, unsigned int *);
> + void (*mp_bootaddress)(vm_paddr_t *, unsigned int *,
> vm_paddr_t); int (*start_all_aps)(void);
> void (*msi_init)(void);
> };
With the patch I could boot without problems and the machine appears to
be stable (ran some high load & memory intensive tests - by the way,
the machine only has 2gb of ram [even though 4g are reported on boot -
usable memory appears to be reported ok]).
Thanks,
Michael
--
Michael Gmelin
More information about the freebsd-current
mailing list