[Bug 208957] Kernel panic (page fault) on 10.3-STABLE with VIMAGE & Infiniband modules
bugzilla-noreply at freebsd.org
bugzilla-noreply at freebsd.org
Thu Apr 21 14:47:22 UTC 2016
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=208957
Bug ID: 208957
Summary: Kernel panic (page fault) on 10.3-STABLE with VIMAGE &
Infiniband modules
Product: Base System
Version: 10.3-RELEASE
Hardware: amd64
OS: Any
Status: New
Severity: Affects Some People
Priority: ---
Component: kern
Assignee: freebsd-bugs at FreeBSD.org
Reporter: justin at postgresql.org
CC: freebsd-amd64 at FreeBSD.org
CC: freebsd-amd64 at FreeBSD.org
The VIMAGE option is causing a kernel panic (page fault) when compiled along
with the Infiniband options on 10.3-STABLE. It's 100% reproducible, and easily
triggered. ;)
Note - compiled this multiple times over the last few days, across several
systems, just to ensure it's not due to bad hw in a system. It panic reliably
every time, on them all. Definitely a software bug of some sort.
Note - Anecdotal evidence suggests the repeated problems of VIMAGE + Infiniband
is a large part of the reason Infiniband isn't supported on FreeNAS. The
NAS4Free project also has difficulties with Infiniband, very likely also due to
this. :(
https://bugs.freenas.org/issues/2014#note-18
Anyway, backtrace info below in case it helps:
(commands taken from
https://www.freebsd.org/doc/en/books/developers-handbook/kerneldebug-gdb.html)
***********************************************************************************
root at cluster1:/usr/obj/usr/src/sys/CONNECTX # kgdb kernel.debug
/var/crash/vmcore.0
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...
Unread portion of the kernel message buffer:
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 12 (irq271: mlx4_core0)
trap number = 12
panic: page fault
cpuid = 0
KDB: stack backtrace:
#0 0xffffffff807263d0 at kdb_backtrace+0x60
#1 0xffffffff806e8c76 at vpanic+0x126
#2 0xffffffff806e8b43 at panic+0x43
#3 0xffffffff80b8bf3b at trap_fatal+0x36b
#4 0xffffffff80b8c23d at trap_pfault+0x2ed
#5 0xffffffff80b8b8ba at trap+0x47a
#6 0xffffffff80b71892 at calltrap+0x8
#7 0xffffffff807be1a2 at netisr_dispatch_src+0x62
#8 0xffffffff808f89fa at ipoib_cm_handle_rx_wc+0x22a
#9 0xffffffff808fcc98 at ipoib_ib_completion+0x78
#10 0xffffffff80930c43 at mlx4_cq_completion+0x63
#11 0xffffffff80933d43 at mlx4_eq_int+0x2c3
#12 0xffffffff80932fac at mlx4_msi_x_interrupt+0xc
#13 0xffffffff806b35cb at intr_event_execute_handlers+0xab
#14 0xffffffff806b3a16 at ithread_loop+0x96
#15 0xffffffff806b104a at fork_exit+0x9a
#16 0xffffffff80b71dce at fork_trampoline+0xe
Uptime: 3m47s
Dumping 485 out of 7857 MB:..4%..14%..24%..33%..43%..53%..63%..73%..83%..93%
Reading symbols from /boot/kernel/ums.ko.symbols...done.
Loaded symbols for /boot/kernel/ums.ko.symbols
#0 doadump (textdump=<value optimized out>) at pcpu.h:219
219 __asm("movq %%gs:%1,%0" : "=r" (td)
(kgdb) list *0xffffffff808f89fa
0xffffffff808f89fa is in ipoib_cm_handle_rx_wc
(/usr/src/sys/ofed/drivers/infiniband/ulp/ipoib/ipoib_cm.c:565).
560 mb->m_pkthdr.rcvif = dev;
561 proto = *mtod(mb, uint16_t *);
562 m_adj(mb, IPOIB_ENCAP_LEN);
563
564 IPOIB_MTAP_PROTO(dev, mb, proto);
565 ipoib_demux(dev, mb, ntohs(proto));
566
567 repost:
568 if (has_srq) {
569 if (unlikely(ipoib_cm_post_receive_srq(priv, wr_id)))
Current language: auto; currently minimal
(kgdb) list *0xffffffff807be1a2
0xffffffff807be1a2 is in netisr_dispatch_src (/usr/src/sys/net/netisr.c:976).
971 if (dispatch_policy == NETISR_DISPATCH_DIRECT) {
972 nwsp = DPCPU_PTR(nws);
973 npwp = &nwsp->nws_work[proto];
974 npwp->nw_dispatched++;
975 npwp->nw_handled++;
976 netisr_proto[proto].np_handler(m);
977 error = 0;
978 goto out_unlock;
979 }
980
(kgdb) list *0xffffffff80b71892
0xffffffff80b71892 is at /usr/src/sys/amd64/amd64/exception.S:238.
233 .type calltrap, at function
234 calltrap:
235 movq %rsp,%rdi
236 call trap
237 MEXITCOUNT
238 jmp doreti /* Handle any pending ASTs */
239
240 /*
241 * alltraps_noen entry point. Unlike alltraps above, we want
to
242 * leave the interrupts disabled. This corresponds to
(kgdb) list *0xffffffff80b8b8ba
0xffffffff80b8b8ba is in trap (/usr/src/sys/amd64/amd64/trap.c:447).
442
443 KASSERT(cold || td->td_ucred != NULL,
444 ("kernel trap doesn't have ucred"));
445 switch (type) {
446 case T_PAGEFLT: /* page fault */
447 (void) trap_pfault(frame, FALSE);
448 goto out;
449
450 case T_DNA:
451 KASSERT(!PCB_USER_FPU(td->td_pcb),
(kgdb)
***********************************************************************************
Kernel configuration used:
---
#
# GENERIC -- Generic kernel configuration file for FreeBSD/amd64
#
# For more information on this file, please read the config(5) manual page,
# and/or the handbook section on Kernel Configuration Files:
#
#
http://www.FreeBSD.org/doc/en_US.ISO8859-1/books/handbook/kernelconfig-config.html
#
# The handbook is also available locally in /usr/share/doc/handbook
# if you've installed the doc distribution, otherwise always see the
# FreeBSD World Wide Web server (http://www.FreeBSD.org/) for the
# latest information.
#
# An exhaustive list of options and more detailed explanations of the
# device lines is also present in the ../../conf/NOTES and NOTES files.
# If you are in doubt as to the purpose or necessity of a line, check first
# in NOTES.
#
# $FreeBSD: stable/10/sys/amd64/conf/GENERIC 286132 2015-07-31 15:25:07Z gjb $
cpu HAMMER
ident CONNECTX2
makeoptions DEBUG=-g # Build kernel with gdb(1) debug
symbols
makeoptions WITH_CTF=1 # Run ctfconvert(1) for DTrace support
#####################################################################
# NETWORKING OPTIONS
#
# DEVICE_POLLING adds support for mixed interrupt-polling handling
# of network device drivers, which has significant benefits in terms
# of robustness to overloads and responsivity, as well as permitting
# accurate scheduling of the CPU time between kernel network processing
# and other activities. The drawback is a moderate (up to 1/HZ seconds)
# potential increase in response times.
# It is strongly recommended to use HZ=1000 or 2000 with DEVICE_POLLING
# to achieve smoother behaviour.
# Additionally, you can enable/disable polling at runtime with help of
# the ifconfig(8) utility, and select the CPU fraction reserved to
# userland with the sysctl variable kern.polling.user_frac
# (default 50, range 0..100).
#
# Not all device drivers support this mode of operation at the time of
# this writing. See polling(4) for more details.
options DEVICE_POLLING
# BPF_JITTER adds support for BPF just-in-time compiler.
options BPF_JITTER
# OpenFabrics Enterprise Distribution (Infiniband).
options OFED
options OFED_DEBUG_INIT
# Sockets Direct Protocol
options SDP
options SDP_DEBUG
# IP over Infiniband
options IPOIB
options IPOIB_DEBUG
options IPOIB_CM
#####################################################################
options SCHED_ULE # ULE scheduler
options PREEMPTION # Enable kernel thread preemption
options INET # InterNETworking
options INET6 # IPv6 communications protocols
options TCP_OFFLOAD # TCP offload
options SCTP # Stream Control Transmission Protocol
options FFS # Berkeley Fast Filesystem
options SOFTUPDATES # Enable FFS soft updates support
options UFS_ACL # Support for access control lists
options UFS_DIRHASH # Improve performance on big
directories
options UFS_GJOURNAL # Enable gjournal-based UFS journaling
options QUOTA # Enable disk quotas for UFS
options MD_ROOT # MD is a potential root device
options NFSCL # New Network Filesystem Client
options NFSD # New Network Filesystem Server
options NFSLOCKD # Network Lock Manager
options NFS_ROOT # NFS usable as /, requires NFSCL
options MSDOSFS # MSDOS Filesystem
options CD9660 # ISO 9660 Filesystem
options PROCFS # Process filesystem (requires
PSEUDOFS)
options PSEUDOFS # Pseudo-filesystem framework
options GEOM_PART_GPT # GUID Partition Tables.
options GEOM_RAID # Soft RAID functionality.
options GEOM_LABEL # Provides labelization
options COMPAT_FREEBSD32 # Compatible with i386 binaries
options COMPAT_FREEBSD4 # Compatible with FreeBSD4
options COMPAT_FREEBSD5 # Compatible with FreeBSD5
options COMPAT_FREEBSD6 # Compatible with FreeBSD6
options COMPAT_FREEBSD7 # Compatible with FreeBSD7
options SCSI_DELAY=5000 # Delay (in ms) before probing SCSI
options KTRACE # ktrace(1) support
options STACK # stack(9) support
options SYSVSHM # SYSV-style shared memory
options SYSVMSG # SYSV-style message queues
options SYSVSEM # SYSV-style semaphores
options _KPOSIX_PRIORITY_SCHEDULING # POSIX P1003_1B real-time
extensions
options PRINTF_BUFR_SIZE=128 # Prevent printf output being
interspersed.
options KBD_INSTALL_CDEV # install a CDEV entry in /dev
options HWPMC_HOOKS # Necessary kernel hooks for hwpmc(4)
options AUDIT # Security event auditing
options CAPABILITY_MODE # Capsicum capability mode
options CAPABILITIES # Capsicum capabilities
options PROCDESC # Support for process descriptors
options MAC # TrustedBSD MAC Framework
options KDTRACE_FRAME # Ensure frames are compiled in
options KDTRACE_HOOKS # Kernel DTrace hooks
options DDB_CTF # Kernel ELF linker loads CTF data
options INCLUDE_CONFIG_FILE # Include this file in kernel
options RACCT # Resource accounting framework
options RACCT_DEFAULT_TO_DISABLED # Set kern.racct.enable=0 by default
options RCTL # Resource limits
# Debugging support. Always need this:
options KDB # Enable kernel debugger support.
options KDB_TRACE # Print a stack trace for a panic.
# Make an SMP-capable kernel by default
options SMP # Symmetric MultiProcessor Kernel
# CPU frequency control
device cpufreq
# Bus support.
device acpi
options ACPI_DMAR
device pci
# Floppy drives
#device fdc
# ATA controllers
device ahci # AHCI-compatible SATA controllers
device ata # Legacy ATA/SATA controllers
options ATA_STATIC_ID # Static device numbering
device mvs # Marvell
88SX50XX/88SX60XX/88SX70XX/SoC SATA
device siis # SiliconImage SiI3124/SiI3132/SiI3531
SATA
# SCSI Controllers
device ahc # AHA2940 and onboard AIC7xxx devices
options AHC_REG_PRETTY_PRINT # Print register bitfields in debug
# output. Adds ~128k to driver.
device ahd # AHA39320/29320 and onboard AIC79xx
devices
options AHD_REG_PRETTY_PRINT # Print register bitfields in debug
# output. Adds ~215k to driver.
device esp # AMD Am53C974 (Tekram DC-390(T))
device hptiop # Highpoint RocketRaid 3xxx series
device isp # Qlogic family
device ispfw # Firmware for QLogic HBAs- normally a
module
device mpt # LSI-Logic MPT-Fusion
device mps # LSI-Logic MPT-Fusion 2
device mpr # LSI-Logic MPT-Fusion 3
device ncr # NCR/Symbios Logic
device sym # NCR/Symbios Logic (newer chipsets +
those of `ncr')
device trm # Tekram DC395U/UW/F DC315U adapters
device adv # Advansys SCSI adapters
device adw # Advansys wide SCSI adapters
device aic # Adaptec 15[012]x SCSI adapters,
AIC-6[23]60.
device bt # Buslogic/Mylex MultiMaster SCSI
adapters
device isci # Intel C600 SAS controller
# ATA/SCSI peripherals
device scbus # SCSI bus (required for ATA/SCSI)
device ch # SCSI media changers
device da # Direct Access (disks)
device sa # Sequential Access (tape etc)
device cd # CD
device pass # Passthrough device (direct ATA/SCSI
access)
device ses # Enclosure Services (SES and SAF-TE)
#device ctl # CAM Target Layer
# RAID controllers interfaced to the SCSI subsystem
device amr # AMI MegaRAID
device arcmsr # Areca SATA II RAID
#XXX it is not 64-bit clean, -scottl
#device asr # DPT SmartRAID V, VI and Adaptec SCSI
RAID
device ciss # Compaq Smart RAID 5*
device dpt # DPT Smartcache III, IV - See NOTES
for options
device hptmv # Highpoint RocketRAID 182x
device hptnr # Highpoint DC7280, R750
device hptrr # Highpoint RocketRAID 17xx, 22xx,
23xx, 25xx
device hpt27xx # Highpoint RocketRAID 27xx
device iir # Intel Integrated RAID
device ips # IBM (Adaptec) ServeRAID
device mly # Mylex AcceleRAID/eXtremeRAID
device twa # 3ware 9000 series PATA/SATA RAID
device tws # LSI 3ware 9750 SATA+SAS 6Gb/s RAID
controller
# RAID controllers
#device aac # Adaptec FSA RAID
#device aacp # SCSI passthrough for aac (requires
CAM)
#device aacraid # Adaptec by PMC RAID
#device ida # Compaq Smart RAID
#device mfi # LSI MegaRAID SAS
#device mlx # Mylex DAC960 family
#device mrsas # LSI/Avago MegaRAID SAS/SATA, 6Gb/s
and 12Gb/s
#XXX PCI ID conflicts with ahd(4) and mvs(4)
#device pmspcv # PMC-Sierra SAS/SATA Controller driver
#XXX pointer/int warnings
#device pst # Promise Supertrak SX6000
#device twe # 3ware ATA RAID
# NVM Express (NVMe) support
device nvme # base NVMe driver
device nvd # expose NVMe namespaces as disks,
depends on nvme
# atkbdc0 controls both the keyboard and the PS/2 mouse
device atkbdc # AT keyboard controller
device atkbd # AT keyboard
device psm # PS/2 mouse
device kbdmux # keyboard multiplexer
device vga # VGA video card driver
options VESA # Add support for VESA BIOS Extensions
(VBE)
device splash # Splash screen and screen saver
support
# syscons is the default console driver, resembling an SCO console
device sc
options SC_PIXEL_MODE # add support for the raster text mode
# vt is the new video console driver
device vt
device vt_vga
device vt_efifb
device agp # support several AGP chipsets
# PCCARD (PCMCIA) support
# PCMCIA and cardbus bridge support
device cbb # cardbus (yenta) bridge
device pccard # PC Card (16-bit) bus
device cardbus # CardBus (32-bit) bus
# Serial (COM) ports
device uart # Generic UART driver
# Parallel port
device ppc
device ppbus # Parallel port bus (required)
device lpt # Printer
device ppi # Parallel port interface device
device vpo # Requires scbus and da
device puc # Multi I/O cards and multi-channel
UARTs
# PCI Ethernet NICs.
#device bxe # Broadcom NetXtreme II
BCM5771X/BCM578XX 10GbE
#device de # DEC/Intel DC21x4x (``Tulip'')
device em # Intel PRO/1000 Gigabit Ethernet
Family
#device igb # Intel PRO/1000 PCIE Server Gigabit
Family
#device ix # Intel PRO/10GbE PCIE PF Ethernet
#device ixv # Intel PRO/10GbE PCIE VF Ethernet
#device ixl # Intel XL710 40Gbe PCIE Ethernet
#device ixlv # Intel XL710 40Gbe VF PCIE Ethernet
device mlx4ib # Mellanox ConnectX HCA InfiniBand
device mlxen # Mellanox ConnectX HCA Ethernet
device mthca # Mellanox HCA InfiniBand
#device le # AMD Am7900 LANCE and Am79C9xx PCnet
#device ti # Alteon Networks Tigon I/II gigabit
Ethernet
#device txp # 3Com 3cR990 (``Typhoon'')
#device vx # 3Com 3c590, 3c595 (``Vortex'')
# PCI Ethernet NICs that use the common MII bus controller code.
# NOTE: Be sure to keep the 'device miibus' line in order to use these NICs!
device miibus # MII bus support
#device ae # Attansic/Atheros L2 FastEthernet
#device age # Attansic/Atheros L1 Gigabit Ethernet
#device alc # Atheros AR8131/AR8132 Ethernet
#device ale # Atheros AR8121/AR8113/AR8114 Ethernet
#device bce # Broadcom BCM5706/BCM5708 Gigabit
Ethernet
#device bfe # Broadcom BCM440x 10/100 Ethernet
#device bge # Broadcom BCM570xx Gigabit Ethernet
#device cas # Sun Cassini/Cassini+ and NS DP83065
Saturn
#device dc # DEC/Intel 21143 and various
workalikes
#device et # Agere ET1310 10/100/Gigabit Ethernet
#device fxp # Intel EtherExpress PRO/100B (82557,
82558)
#device gem # Sun GEM/Sun ERI/Apple GMAC
#device hme # Sun HME (Happy Meal Ethernet)
#device jme # JMicron JMC250 Gigabit/JMC260 Fast
Ethernet
#device lge # Level 1 LXT1001 gigabit Ethernet
#device msk # Marvell/SysKonnect Yukon II Gigabit
Ethernet
#device nfe # nVidia nForce MCP on-board Ethernet
#device nge # NatSemi DP83820 gigabit Ethernet
#device nve # nVidia nForce MCP on-board Ethernet
Networking
#device pcn # AMD Am79C97x PCI 10/100 (precedence
over 'le')
device re # RealTek 8139C+/8169/8169S/8110S
#device rl # RealTek 8129/8139
#device sf # Adaptec AIC-6915 (``Starfire'')
#device sge # Silicon Integrated Systems SiS190/191
#device sis # Silicon Integrated Systems SiS
900/SiS 7016
#device sk # SysKonnect SK-984x & SK-982x gigabit
Ethernet
#device ste # Sundance ST201 (D-Link DFE-550TX)
#device stge # Sundance/Tamarack TC9021 gigabit
Ethernet
#device tl # Texas Instruments ThunderLAN
#device tx # SMC EtherPower II (83c170 ``EPIC'')
#device vge # VIA VT612x gigabit Ethernet
#device vr # VIA Rhine, Rhine II
#device wb # Winbond W89C840F
#device xl # 3Com 3c90x (``Boomerang'',
``Cyclone'')
# ISA Ethernet NICs. pccard NICs included.
#device cs # Crystal Semiconductor CS89x0 NIC
# 'device ed' requires 'device miibus'
#device ed # NE[12]000, SMC Ultra, 3c503, DS8390
cards
#device ex # Intel EtherExpress Pro/10 and Pro/10+
#device ep # Etherlink III based cards
#device fe # Fujitsu MB8696x based cards
#device sn # SMC's 9000 series of Ethernet chips
#device xe # Xircom pccard Ethernet
# Wireless NIC cards
#device wlan # 802.11 support
#options IEEE80211_DEBUG # enable debug msgs
#options IEEE80211_AMPDU_AGE # age frames in AMPDU reorder q's
#options IEEE80211_SUPPORT_MESH # enable 802.11s draft support
#device wlan_wep # 802.11 WEP support
#device wlan_ccmp # 802.11 CCMP support
#device wlan_tkip # 802.11 TKIP support
#device wlan_amrr # AMRR transmit rate control algorithm
#device an # Aironet 4500/4800 802.11 wireless
NICs.
#device ath # Atheros NICs
#device ath_pci # Atheros pci/cardbus glue
#device ath_hal # pci/cardbus chip support
#options AH_SUPPORT_AR5416 # enable AR5416 tx/rx descriptors
#options AH_AR5416_INTERRUPT_MITIGATION # AR5416 interrupt mitigation
#options ATH_ENABLE_11N # Enable 802.11n support for AR5416 and
later
#device ath_rate_sample # SampleRate tx rate control for ath
#device bwi # Broadcom BCM430x/BCM431x wireless
NICs.
#device bwn # Broadcom BCM43xx wireless NICs.
#device ipw # Intel 2100 wireless NICs.
#device iwi # Intel 2200BG/2225BG/2915ABG wireless
NICs.
#device iwn # Intel 4965/1000/5000/6000 wireless
NICs.
#device malo # Marvell Libertas wireless NICs.
#device mwl # Marvell 88W8363 802.11n wireless
NICs.
#device ral # Ralink Technology RT2500 wireless
NICs.
#device wi # WaveLAN/Intersil/Symbol 802.11
wireless NICs.
#device wpi # Intel 3945ABG wireless NICs.
# Pseudo devices.
device loop # Network loopback
device random # Entropy device
device padlock_rng # VIA Padlock RNG
device rdrand_rng # Intel Bull Mountain RNG
device ether # Ethernet support
device vlan # 802.1Q VLAN support
device tun # Packet tunnel.
device md # Memory "disks"
device gif # IPv6 and IPv4 tunneling
device faith # IPv6-to-IPv4 relaying (translation)
device firmware # firmware assist module
# The `bpf' device enables the Berkeley Packet Filter.
# Be aware of the administrative consequences of enabling this!
# Note that 'bpf' is required for DHCP.
device bpf # Berkeley packet filter
# USB support
options USB_DEBUG # enable debug msgs
device uhci # UHCI PCI->USB interface
device ohci # OHCI PCI->USB interface
device ehci # EHCI PCI->USB interface (USB 2.0)
device xhci # XHCI PCI->USB interface (USB 3.0)
device usb # USB Bus (required)
device ukbd # Keyboard
device umass # Disks/Mass storage - Requires scbus
and da
# Sound support
device sound # Generic sound driver (required)
#device snd_cmi # CMedia CMI8338/CMI8738
#device snd_csa # Crystal Semiconductor CS461x/428x
#device snd_emu10kx # Creative SoundBlaster Live! and
Audigy
#device snd_es137x # Ensoniq AudioPCI ES137x
device snd_hda # Intel High Definition Audio
device snd_ich # Intel, NVidia and other ICH AC'97
Audio
device snd_via8233 # VIA VT8233x Audio
# MMC/SD
device mmc # MMC/SD bus
device mmcsd # MMC/SD memory card
device sdhci # Generic PCI SD Host Controller
# VirtIO support
device virtio # Generic VirtIO bus (required)
device virtio_pci # VirtIO PCI device
device vtnet # VirtIO Ethernet device
device virtio_blk # VirtIO Block device
device virtio_scsi # VirtIO SCSI device
device virtio_balloon # VirtIO Memory Balloon device
# HyperV drivers and enchancement support
# NOTE: HYPERV depends on hyperv. They must be added or removed together.
options HYPERV # Hyper-V kernel infrastructure
device hyperv # HyperV drivers
# Xen HVM Guest Optimizations
# NOTE: XENHVM depends on xenpci. They must be added or removed together.
options XENHVM # Xen HVM kernel infrastructure
device xenpci # Xen HVM Hypervisor services driver
# VMware support
device vmx # VMware VMXNET3 Ethernet
# 2016-04-21 JC Added VIMAGE just to verify it's the crash causer
options VIMAGE
---
--
You are receiving this mail because:
You are on the CC list for the bug.
More information about the freebsd-amd64
mailing list