i386/121148: Repeatable sysctl crash (Fatal Trap 12) with ACPI
enabled
Jim Pingle
lists at pingle.org
Wed Feb 27 16:50:01 UTC 2008
>Number: 121148
>Category: i386
>Synopsis: Repeatable sysctl crash (Fatal Trap 12) with ACPI enabled
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: freebsd-i386
>State: open
>Quarter:
>Keywords:
>Date-Required:
>Class: sw-bug
>Submitter-Id: current-users
>Arrival-Date: Wed Feb 27 16:50:00 UTC 2008
>Closed-Date:
>Last-Modified:
>Originator: Jim Pingle
>Release: 7.0-PRERELEASE (RELENG_7)
>Organization:
HPC Internet Services
>Environment:
FreeBSD test1.hpcisp.com 7.0-PRERELEASE FreeBSD 7.0-PRERELEASE #1: Thu Feb 14 14:08:02 EST 2008 root at test1.hpcisp.com:/usr/obj/usr/src/sys/TEST i386
>Description:
SuperMicro SuperServer 6022L-6 will not fully boot RELENG_7 unless I booth with ACPI disabled. RELENG_7_0 does not crash on the same hardware with the same config.
Crash is as follows:
Fatal trap 12: page fault while in kernel mode
cpuid = 3; apic id = 03
fault virtual address = 0x2043455c
fault code = supervisor read, page not present
instruction pointer = 0x20:0xc0742c86
stack pointer = 0x28:0xe8cada0c
frame pointer = 0x28:0xe8cada38
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, def32 1, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 68 (sysctl)
trap number = 12
panic: page fault
cpuid = 3
Uptime: 6s
Physical memory: 2035 MB
Dumping 65 MB: 50 34 18 2
The crash happens just after the "Entropy harvesting..." line, before swap is started. As you can see in the crash output, the offending process is sysctl.
I can boot to single user mode, but if I issue sysctl -a while there, it also crashes. When sysctl -a is run in single user mode, the last three lines before the crash are (transcribed by hand, no serial console available):
dev.pcib.3.%location: handle=\_SB_.PCI3
dev.pcib.3.%pnpinfo: _HID=PNP0A03 UID=3
dev.pcib.3.%parent: acpi0
With a working RELENG_7_0 the lines immediately following this are:
dev.pcib.4.%desc: ACPI Host-PCI bridge
dev.pcib.4.%driver: pcib
dev.pcib.4.%location: handle=\_SB_.PCI4
dev.pcib.4.%pnpinfo: _HID=PNP0A03 _UID=4
dev.pcib.4.%parent: acpi0
I tried a binary search of the source tree to narrow down the crash. I found that one possible vector for the crash was introduced between 2007/12/19 20:00:00 (booted OK) and 2007/12/19 23:59:00 (crashed), which left me with only a handful of files to test.
By process of elimination, I found that if I backed some changes out in src/sys/i386/i386/machdep.c, the crash stopped.
src/sys/i386/i386/machdep.c v1.658 2007/08/09 njl - Boots OK
src/sys/i386/i386/machdep.c v1.658.2.1 2007/12/19 rpaulo - Crashes
The confusing part (to me) is that my next step was to update all the way to RELENG_7 as of yesterday, then back out those same changes, but the crash still happened. So either I misidentified the cause of the crash -- which is quite possible -- or it was reintroduced in some other change (or both!).
kgdb output from vmcore.0:
Unread portion of the kernel message buffer:
Copyright (c) 1992-2008 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 7.0-PRERELEASE #0: Mon Feb 25 15:22:54 EST 2008
root at test1.hpcisp.com:/usr/obj/usr/src/sys/GENERIC
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: Intel(R) XEON(TM) CPU 2.00GHz (1999.94-MHz 686-class CPU)
Origin = "GenuineIntel" Id = 0xf24 Stepping = 4
Features=0x3febfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM>
Logical CPUs per core: 2
real memory = 2147418112 (2047 MB)
avail memory = 2091872256 (1994 MB)
ACPI APIC Table: <RCC GCHE >
FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs
cpu0 (BSP): APIC ID: 0
cpu1 (AP): APIC ID: 1
cpu2 (AP): APIC ID: 2
cpu3 (AP): APIC ID: 3
ACPI Warning (tbfadt-0505): Optional field "Gpe1Block" has zero address or length: 0 0/8 [20070320]
MADT: Forcing active-low polarity and level trigger for SCI
ioapic0 <Version 1.1> irqs 0-15 on motherboard
ioapic1 <Version 1.1> irqs 16-31 on motherboard
ioapic2 <Version 1.1> irqs 32-47 on motherboard
kbd1 at kbdmux0
ath_hal: 0.9.20.3 (AR5210, AR5211, AR5212, RF5111, RF5112, RF2413, RF5413)
hptrr: HPT RocketRAID controller driver v1.1 (Feb 25 2008 15:20:56)
acpi0: <RCC GCHE> on motherboard
ACPI Warning (dswload-0794): Type override - [DEB_] had invalid type (Integer) for Scope operator, changed to (Scope) [20070320]
ACPI Warning (dswload-0794): Type override - [MLIB] had invalid type (Integer) for Scope operator, changed to (Scope) [20070320]
ACPI Warning (dswload-0794): Type override - [IO__] had invalid type (Integer) for Scope operator, changed to (Scope) [20070320]
ACPI Warning (dswload-0794): Type override - [DATA] had invalid type (String) for Scope operator, changed to (Scope) [20070320]
ACPI Warning (dswload-0794): Type override - [SIO_] had invalid type (String) for Scope operator, changed to (Scope) [20070320]
ACPI Warning (dswload-0794): Type override - [SB__] had invalid type (String) for Scope operator, changed to (Scope) [20070320]
ACPI Warning (dswload-0794): Type override - [PM__] had invalid type (String) for Scope operator, changed to (Scope) [20070320]
ACPI Warning (dswload-0794): Type override - [ICNT] had invalid type (String) for Scope operator, changed to (Scope) [20070320]
ACPI Warning (dswload-0794): Type override - [ACPI] had invalid type (String) for Scope operator, changed to (Scope) [20070320]
ACPI Warning (dswload-0794): Type override - [IORG] had invalid type (String) for Scope operator, changed to (Scope) [20070320]
ACPI Warning (dswload-0794): Type override - [SB__] had invalid type (String) for Scope operator, changed to (Scope) [20070320]
ACPI Warning (dswload-0794): Type override - [PM__] had invalid type (String) for Scope operator, changed to (Scope) [20070320]
ACPI Warning (dswload-0794): Type override - [SIO_] had invalid type (String) for Scope operator, changed to (Scope) [20070320]
ACPI Warning (dswload-0794): Type override - [PM__] had invalid type (String) for Scope operator, changed to (Scope) [20070320]
ACPI Warning (dswload-0794): Type override - [BIOS] had invalid type (Integer) for Scope operator, changed to (Scope) [20070320]
ACPI Warning (dswload-0794): Type override - [CMOS] had invalid type (Integer) for Scope operator, changed to (Scope) [20070320]
ACPI Warning (dswload-0794): Type override - [KBC_] had invalid type (Integer) for Scope operator, changed to (Scope) [20070320]
ACPI Warning (dswload-0794): Type override - [OEM_] had invalid type (Integer) for Scope operator, changed to (Scope) [20070320]
acpi0: [ITHREAD]
acpi0: Power Button (fixed)
acpi0: reservation of 0, a0000 (3) failed
acpi0: reservation of 100000, 7ff00000 (3) failed
Timecounter "ACPI-safe" frequency 3579545 Hz quality 850
acpi_timer0: <32-bit timer at 3.579545MHz> port 0x508-0x50b on acpi0
cpu0: <ACPI CPU> on acpi0
p4tcc0: <CPU Frequency Thermal Control> on cpu0
cpu1: <ACPI CPU> on acpi0
p4tcc1: <CPU Frequency Thermal Control> on cpu1
cpu2: <ACPI CPU> on acpi0
p4tcc2: <CPU Frequency Thermal Control> on cpu2
cpu3: <ACPI CPU> on acpi0
p4tcc3: <CPU Frequency Thermal Control> on cpu3
acpi_button0: <Sleep Button> on acpi0
pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
pci0: <ACPI PCI bus> on pcib0
vgapci0: <VGA-compatible display> port 0xa800-0xa8ff mem 0xfd000000-0xfdffffff,0xfe5ff000-0xfe5fffff irq 18 at device 2.0 on pci0
fxp0: <Intel 82550 Pro/100 Ethernet> port 0xae80-0xaebf mem 0xfe5fc000-0xfe5fcfff,0xfe580000-0xfe59ffff irq 17 at device 4.0 on pci0
miibus0: <MII bus> on fxp0
inphy0: <i82555 10/100 media interface> PHY 1 on miibus0
inphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
fxp0: Ethernet address: 00:30:48:20:a3:9e
fxp0: [ITHREAD]
fxp1: <Intel 82550 Pro/100 Ethernet> port 0xaf00-0xaf3f mem 0xfe5fd000-0xfe5fdfff,0xfe5a0000-0xfe5bffff irq 19 at device 5.0 on pci0
miibus1: <MII bus> on fxp1
inphy1: <i82555 10/100 media interface> PHY 1 on miibus1
inphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
fxp1: Ethernet address: 00:30:48:20:a3:9f
fxp1: [ITHREAD]
isab0: <PCI-ISA bridge> at device 15.0 on pci0
isa0: <ISA bus> on isab0
atapci0: <ServerWorks CSB5 UDMA100 controller> port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xffa0-0xffaf at device 15.1 on pci0
ata0: <ATA channel 0> on atapci0
ata0: [ITHREAD]
ata1: <ATA channel 1> on atapci0
ata1: [ITHREAD]
ohci0: <OHCI (generic) USB controller> mem 0xfe5fe000-0xfe5fefff irq 10 at device 15.2 on pci0
ohci0: [GIANT-LOCKED]
ohci0: [ITHREAD]
usb0: OHCI version 1.0, legacy support
usb0: SMM does not respond, resetting
usb0: <OHCI (generic) USB controller> on ohci0
usb0: USB revision 1.0
uhub0: <(0x1166) OHCI root hub, class 9/0, rev 1.00/1.00, addr 1> on usb0
uhub0: 4 ports with 4 removable, self powered
pcib1: <ACPI Host-PCI bridge> on acpi0
pci1: <ACPI PCI bus> on pcib1
pcib2: <ACPI Host-PCI bridge> on acpi0
pci2: <ACPI PCI bus> on pcib2
pcib3: <ACPI Host-PCI bridge> on acpi0
pci3: <ACPI PCI bus> on pcib3
pcib4: <ACPI Host-PCI bridge> on acpi0
pci4: <ACPI PCI bus> on pcib4
asr0: <Adaptec Caching SCSI RAID> mem 0xfeb00000-0xfebfffff,0xfb000000-0xfbffffff,0xf8000000-0xf9ffffff irq 29 at device 3.0 on pci4
asr0: [GIANT-LOCKED]
asr0: [ITHREAD]
asr0: ADAPTEC 2005S FW Rev. 380E, 2 channel, 2000 CCBs, Protocol I2O
atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0
atkbd0: <AT Keyboard> irq 1 on atkbdc0
kbd0 at atkbd0
atkbd0: [GIANT-LOCKED]
atkbd0: [ITHREAD]
psm0: <PS/2 Mouse> irq 12 on atkbdc0
psm0: [GIANT-LOCKED]
psm0: [ITHREAD]
psm0: model NetMouse/NetScroll Optical, device ID 0
fdc0: <floppy drive controller (FDE)> port 0x3f2-0x3f3,0x3f4-0x3f5,0x3f7 irq 6 drq 2 on acpi0
fdc0: [FILTER]
fd0: <1440-KB 3.5" drive> on fdc0 drive 0
sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0
sio0: type 16550A
sio0: [FILTER]
sio1: <16550A-compatible COM port> port 0x2f8-0x2ff irq 3 on acpi0
sio1: type 16550A
sio1: [FILTER]
pmtimer0 on isa0
orm0: <ISA Option ROMs> at iomem 0xc0000-0xc7fff,0xc8000-0xcdfff,0xce000-0xcefff,0xcf000-0xcffff pnpid ORM0000 on isa0
ppc0: <Parallel port> at port 0x378-0x37f irq 7 on isa0
ppc0: Generic chipset (ECP/PS2/NIBBLE) in COMPATIBLE mode
ppc0: FIFO with 16/16/8 bytes threshold
ppbus0: <Parallel port bus> on ppc0
ppbus0: [ITHREAD]
plip0: <PLIP network interface> on ppbus0
lpt0: <Printer> on ppbus0
lpt0: Interrupt-driven port
ppi0: <Parallel I/O> on ppbus0
ppc0: [GIANT-LOCKED]
ppc0: [ITHREAD]
sc0: <System console> at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=0x300>
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
Timecounters tick every 1.000 msec
hptrr: no controller detected.
acd0: CDROM <MATSHITA CR-177/7T0D> at ata1-master UDMA33
da0 at asr0 bus 0 target 0 lun 0
da0: <ADAPTEC RAID-5 380E> Fixed Direct Access SCSI-2 device
ses0 at asr0 bus 0 target 6 lun 0
ses0: <SUPER GEM318 0> Fixed Processor SCSI-2 device
SMP: AP CPU #3 Launched!
SMP: AP CPU #2 Launched!
SMP: AP CPU #1 Launched!
Trying to mount root from ufs:/dev/da0s1a
<118>Loading configuration files.
<118>kernel dumps on /dev/da0s1b
<118>Entropy harvesting:
<118> interrupts
<118> ethernet
<118> point_to_point
Fatal trap 12: page fault while in kernel mode
cpuid = 3; apic id = 03
fault virtual address = 0x2043455c
fault code = supervisor read, page not present
instruction pointer = 0x20:0xc0742c86
stack pointer = 0x28:0xe8cada0c
frame pointer = 0x28:0xe8cada38
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, def32 1, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 68 (sysctl)
trap number = 12
panic: page fault
cpuid = 3
Uptime: 6s
Physical memory: 2035 MB
Dumping 65 MB: 50 34 18 2
#0 doadump () at pcpu.h:195
195 pcpu.h: No such file or directory.
in pcpu.h
(kgdb) bt
#0 doadump () at pcpu.h:195
#1 0xc073a688 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:409
#2 0xc073a941 in panic (fmt=Variable "fmt" is not available.
) at /usr/src/sys/kern/kern_shutdown.c:563
#3 0xc0a19dc0 in trap_fatal (frame=0xe8cad9cc, eva=541279580) at /usr/src/sys/i386/i386/trap.c:899
#4 0xc0a1a030 in trap_pfault (frame=0xe8cad9cc, usermode=0, eva=541279580) at /usr/src/sys/i386/i386/trap.c:812
#5 0xc0a1a9ad in trap (frame=0xe8cad9cc) at /usr/src/sys/i386/i386/trap.c:490
#6 0xc0a01cab in calltrap () at /usr/src/sys/i386/i386/exception.s:139
#7 0xc0742c86 in sysctl_sysctl_next_ls (lsp=Variable "lsp" is not available.
) at /usr/src/sys/kern/kern_sysctl.c:630
#8 0xc0742d46 in sysctl_sysctl_next_ls (lsp=Variable "lsp" is not available.
) at /usr/src/sys/kern/kern_sysctl.c:618
#9 0xc0742d83 in sysctl_sysctl_next_ls (lsp=Variable "lsp" is not available.
) at /usr/src/sys/kern/kern_sysctl.c:630
#10 0xc0742d83 in sysctl_sysctl_next_ls (lsp=Variable "lsp" is not available.
) at /usr/src/sys/kern/kern_sysctl.c:630
#11 0xc0742de6 in sysctl_sysctl_next (oidp=0xc0b4c940, arg1=0xe8cadc1c, arg2=4, req=0xe8cadba4)
at /usr/src/sys/kern/kern_sysctl.c:651
#12 0xc07436f2 in sysctl_root (oidp=Variable "oidp" is not available.
) at /usr/src/sys/kern/kern_sysctl.c:1306
#13 0xc074382e in userland_sysctl (td=0xc5574210, name=0xe8cadc14, namelen=6, old=0xbfbfe4e8, oldlenp=0xbfbfe598,
inkernel=0, new=0x0, newlen=0, retval=0xe8cadc10, flags=0) at /usr/src/sys/kern/kern_sysctl.c:1401
#14 0xc0744462 in __sysctl (td=0xc5574210, uap=0xe8cadcfc) at /usr/src/sys/kern/kern_sysctl.c:1336
#15 0xc0a1a378 in syscall (frame=0xe8cadd38) at /usr/src/sys/i386/i386/trap.c:1035
#16 0xc0a01d10 in Xint0x80_syscall () at /usr/src/sys/i386/i386/exception.s:196
#17 0x00000033 in ?? ()
Previous frame inner to this frame (corrupt stack?)
This is a testing machine that is only being used to evaluate 7.0 for use on similar hardware. I can take whatever debugging steps that are needed, just let me know what information is necessary to help resolve the issue.
I tried posting this information to the -STABLE list, but received no replies.
System is running with the most current BIOS available from the OEM. RAM tested OK with memtest86+ left running for a day or so.
>How-To-Repeat:
Attempt to boot with a RELENG_7 world/kernel on a SuperMicro SuperServer 6022L-6 with ACPI enabled.
Alternately, boot to single user mode and issue "sysctl -a". Crashes every time in the exact same place.
>Fix:
Workaround is to run with ACPI disabled, but that is not desired.
One part of the crash was possibly introduced with rev v1.658.2.1 of src/sys/i386/i386/machdep.c, but I am unable to repeat that fix on recent RELENG_7 sources.
>Release-Note:
>Audit-Trail:
>Unformatted:
More information about the freebsd-i386
mailing list