Troubleshooting panic on 10.2-RELEASE
Paul Thornton
prt at prt.org
Mon Feb 8 09:09:20 UTC 2016
Hi
My Kernel debugging isn't up to troubleshooting what's going on here.
We have two out of 10 identical VMs that regularly (between 5 and 20
hours) panic. On one, I have recompiled GENERIC to get a kernel.debug
and have crash dumps being recorded. They are running GENERIC kernels
with ZFS-on-root (dmesg at the bottom of this mail). All of the other
machines work fine, although the ones that are crashing have the highest
use at the moment.
The panic is lock-related, and going by pid 12 ([intr]) is interrupt
related to network tx/rx.
Unread portion of the kernel message buffer:
Sleeping thread (tid 100024, pid 12) owns a non-sleepable lock
KDB: stack backtrace of thread 100024:
#0 0xffffffff80952061 at mi_switch+0xe1
#1 0xffffffff8099043a at sleepq_wait+0x3a
#2 0xffffffff80951a87 at _sleep+0x287
#3 0xffffffff80a040c4 at filt_bpfread+0x94
#4 0xffffffff809079eb at knote+0xdb
#5 0xffffffff80a00acb at catchpacket+0x67b
#6 0xffffffff80a00d20 at bpf_mtap+0x1e0
#7 0xffffffff80a0f634 at ether_nh_input+0x174
#8 0xffffffff80a177d2 at netisr_dispatch_src+0x62
#9 0xffffffff80df8fcf at vmxnet3_rxq_eof+0x4ff
#10 0xffffffff80df86f1 at vmxnet3_legacy_intr+0xe1
#11 0xffffffff8091482b at intr_event_execute_handlers+0xab
#12 0xffffffff80914c76 at ithread_loop+0x96
#13 0xffffffff8091244a at fork_exit+0x9a
#14 0xffffffff80d30cde at fork_trampoline+0xe
panic: sleeping thread
cpuid = 0
So far, out of 5 dumps, I see two patterns emerging. One, like the
above, involves vmxnet3, the other (to my untrained eye) looks like a
write on a Unix socket. They both share knote.
(kgdb) backtrace
#0 doadump (textdump=<value optimized out>) at pcpu.h:219
#1 0xffffffff80948642 in kern_reboot (howto=260) at
../../../kern/kern_shutdown.c:451
#2 0xffffffff80948a25 in vpanic (fmt=<value optimized out>, ap=<value
optimized out>) at ../../../kern/kern_shutdown.c:758
#3 0xffffffff809488b3 in panic (fmt=0x0) at
../../../kern/kern_shutdown.c:687
#4 0xffffffff80995f89 in propagate_priority (td=<value optimized out>)
at ../../../kern/subr_turnstile.c:227
#5 0xffffffff80996a0e in turnstile_wait (ts=<value optimized out>,
owner=<value optimized out>, queue=<value optimized out>) at
../../../kern/subr_turnstile.c:743
#6 0xffffffff8092ea3b in __mtx_lock_sleep (c=0xfffff80027b63718,
tid=18446735277657056416, opts=<value optimized out>, file=<value
optimized out>, line=0) at ../../../kern/kern_mutex.c:522
#7 0xffffffff809079a5 in knote (list=0xfffff80027ca50a8, hint=0,
lockflags=<value optimized out>) at ../../../kern/kern_event.c:1873
#8 0xffffffff80a00acb in catchpacket (d=<value optimized out>,
pkt=<value optimized out>, pktlen=<value optimized out>, snaplen=<value
optimized out>, cpfn=<value optimized out>, bt=<value optimized out>) at
../../../net/bpf.c:980
#9 0xffffffff80a00d20 in bpf_mtap (bp=0xfffff800027e3680,
m=0xfffff800274d2b00) at ../../../net/bpf.c:2142
#10 0xffffffff80a0f634 in ether_nh_input (m=<value optimized out>) at
../../../net/if_ethersubr.c:520
#11 0xffffffff80a177d2 in netisr_dispatch_src (proto=<value optimized
out>, source=<value optimized out>, m=0x0) at ../../../net/netisr.c:976
#12 0xffffffff80df8fcf in vmxnet3_rxq_eof (rxq=0xfffff80002797400) at
../../../dev/vmware/vmxnet3/if_vmx.c:2076
#13 0xffffffff80df86f1 in vmxnet3_legacy_intr (xsc=0xfffff800027c3800)
at ../../../dev/vmware/vmxnet3/if_vmx.c:2245
#14 0xffffffff8091482b in intr_event_execute_handlers (p=<value
optimized out>, ie=0xfffff800027ea100) at ../../../kern/kern_intr.c:1264
#15 0xffffffff80914c76 in ithread_loop (arg=0xfffff800027ca7a0) at
../../../kern/kern_intr.c:1277
#16 0xffffffff8091244a in fork_exit (callout=0xffffffff80914be0
<ithread_loop>, arg=0xfffff800027ca7a0, frame=0xfffffe00934ccac0) at
../../../kern/kern_fork.c:1018
#17 0xffffffff80d30cde in fork_trampoline () at
../../../amd64/amd64/exception.S:611
#18 0x0000000000000000 in ?? ()
[From a different panic and reboot]
(kgdb) backtrace
#0 doadump (textdump=<value optimized out>) at pcpu.h:219
#1 0xffffffff80948642 in kern_reboot (howto=260) at
../../../kern/kern_shutdown.c:451
#2 0xffffffff80948a25 in vpanic (fmt=<value optimized out>, ap=<value
optimized out>) at ../../../kern/kern_shutdown.c:758
#3 0xffffffff809488b3 in panic (fmt=0x0) at
../../../kern/kern_shutdown.c:687
#4 0xffffffff80995f89 in propagate_priority (td=<value optimized out>)
at ../../../kern/subr_turnstile.c:227
#5 0xffffffff80996a0e in turnstile_wait (ts=<value optimized out>,
owner=<value optimized out>, queue=<value optimized out>) at
../../../kern/subr_turnstile.c:743
#6 0xffffffff8092ea3b in __mtx_lock_sleep (c=0xfffff8002770de18,
tid=18446735278278059168, opts=<value optimized out>, file=<value
optimized out>, line=0) at ../../../kern/kern_mutex.c:522
#7 0xffffffff809079a5 in knote (list=0xfffff80027866b80, hint=0,
lockflags=<value optimized out>) at ../../../kern/kern_event.c:1873
#8 0xffffffff809bcc35 in sowakeup (so=0xfffff80027866ae0,
sb=0xfffff80027866b70) at ../../../kern/uipc_sockbuf.c:189
#9 0xffffffff809cce46 in uipc_send (so=0xfffff8002786fae0, flags=<value
optimized out>, m=<value optimized out>, nam=0x0, control=0x0, td=<value
optimized out>) at ../../../kern/uipc_usrreq.c:902
#10 0xffffffff809c0526 in sosend_generic (so=0xfffff8002786fae0,
addr=0x0, uio=0xfffffe00935fa960, top=<value optimized out>,
control=<value optimized out>, flags=<value optimized out>, td=0x0) at
../../../kern/uipc_socket.c:1283
#11 0xffffffff809a4869 in soo_write (fp=<value optimized out>,
uio=0xfffffe00935fa960, active_cred=<value optimized out>, flags=<value
optimized out>, td=<value optimized out>) at ../../../kern/sys_socket.c:103
#12 0xffffffff8099c647 in dofilewrite (td=0xfffff800276e24a0, fd=7,
fp=0xfffff80027871d20, auio=0xfffffe00935fa960, offset=<value optimized
out>, flags=0) at file.h:304
#13 0xffffffff8099c378 in kern_writev (td=0xfffff800276e24a0, fd=7,
auio=0xfffffe00935fa960) at ../../../kern/sys_generic.c:481
#14 0xffffffff8099c303 in sys_write (td=<value optimized out>,
uap=<value optimized out>) at ../../../kern/sys_generic.c:396
#15 0xffffffff80d4b3a7 in amd64_syscall (td=0xfffff800276e24a0,
traced=0) at subr_syscall.c:134
#16 0xffffffff80d30a8b in Xfast_syscall () at
../../../amd64/amd64/exception.S:396
#17 0x0000000801a1cc0a in ?? ()
We haven't seen this on any other 10.2-RELEASE install - either on a VM
or physical machine. What's the best way forward to troubleshoot this?
Thanks
Paul.
boot dmesg follows:
Copyright (c) 1992-2015 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 10.2-RELEASE-p8 #1: Fri Feb 5 16:22:56 UTC 2016
root at portal.dev:/usr/src/sys/amd64/compile/GENERIC amd64
FreeBSD clang version 3.4.1 (tags/RELEASE_34/dot1-final 208032) 20140512
CPU: Intel(R) Xeon(R) CPU E5-1620 v3 @ 3.50GHz (3500.00-MHz K8-class CPU)
Origin="GenuineIntel" Id=0x306f2 Family=0x6 Model=0x3f Stepping=2
Features=0xfa3fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,DTS,MMX,FXSR,SSE,SSE2,SS>
Features2=0xfdfa3203<SSE3,PCLMULQDQ,SSSE3,FMA,CX16,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,TSCDLT,XSAVE,OSXSAVE,AVX,F16C,RDRAND,HV>
AMD Features=0x2c100800<SYSCALL,NX,Page1GB,RDTSCP,LM>
AMD Features2=0x21<LAHF,ABM>
Structured Extended
Features=0x272a<TSCADJ,BMI1,AVX2,BMI2,ERMS,INVPCID,NFPUSG>
XSAVE Features=0x1<XSAVEOPT>
TSC: P-state invariant
Hypervisor: Origin = "VMwareVMware"
real memory = 2147483648 (2048 MB)
avail memory = 2050248704 (1955 MB)
Event timer "LAPIC" quality 600
ACPI APIC Table: <PTLTD APIC >
MADT: Forcing active-low polarity and level trigger for SCI
ioapic0 <Version 1.1> irqs 0-23 on motherboard
random: <Software, Yarrow> initialized
kbd1 at kbdmux0
acpi0: <INTEL 440BX> on motherboard
acpi0: Power Button (fixed)
Timecounter "HPET" frequency 14318180 Hz quality 950
cpu0: <ACPI CPU> on acpi0
attimer0: <AT timer> port 0x40-0x43 irq 0 on acpi0
Timecounter "i8254" frequency 1193182 Hz quality 0
Event timer "i8254" frequency 1193182 Hz quality 100
atrtc0: <AT realtime clock> port 0x70-0x71 irq 8 on acpi0
Event timer "RTC" frequency 32768 Hz quality 0
Timecounter "ACPI-fast" frequency 3579545 Hz quality 900
acpi_timer0: <24-bit timer at 3.579545MHz> port 0x1008-0x100b on acpi0
pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
pci0: <ACPI PCI bus> on pcib0
pcib1: <ACPI PCI-PCI bridge> at device 1.0 on pci0
pci1: <ACPI PCI bus> on pcib1
isab0: <PCI-ISA bridge> at device 7.0 on pci0
isa0: <ISA bus> on isab0
atapci0: <Intel PIIX4 UDMA33 controller> port
0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0x1060-0x106f at device 7.1 on pci0
ata0: <ATA channel> at channel 0 on atapci0
ata1: <ATA channel> at channel 1 on atapci0
pci0: <bridge> at device 7.3 (no driver attached)
vgapci0: <VGA-compatible display> port 0x1070-0x107f mem
0xe8000000-0xefffffff,0xfe000000-0xfe7fffff irq 16 at device 15.0 on pci0
vgapci0: Boot video device
pcib2: <ACPI PCI-PCI bridge> at device 17.0 on pci0
pci2: <ACPI PCI bus> on pcib2
pcib3: <ACPI PCI-PCI bridge> at device 21.0 on pci0
pci3: <ACPI PCI bus> on pcib3
vmx0: <VMware VMXNET3 Ethernet Adapter> port 0x4000-0x400f mem
0xfd5fc000-0xfd5fcfff,0xfd5fd000-0xfd5fdfff,0xfd5fe000-0xfd5fffff irq 18
at device 0.0 on pci3
vmx0: Ethernet address: 00:0c:29:e5:ef:c4
pcib4: <ACPI PCI-PCI bridge> at device 21.1 on pci0
pci4: <ACPI PCI bus> on pcib4
pcib5: <ACPI PCI-PCI bridge> at device 21.2 on pci0
pci5: <ACPI PCI bus> on pcib5
pcib6: <ACPI PCI-PCI bridge> at device 21.3 on pci0
pci6: <ACPI PCI bus> on pcib6
pcib7: <ACPI PCI-PCI bridge> at device 21.4 on pci0
pci7: <ACPI PCI bus> on pcib7
pcib8: <ACPI PCI-PCI bridge> at device 21.5 on pci0
pci8: <ACPI PCI bus> on pcib8
pcib9: <ACPI PCI-PCI bridge> at device 21.6 on pci0
pci9: <ACPI PCI bus> on pcib9
pcib10: <ACPI PCI-PCI bridge> at device 21.7 on pci0
pci10: <ACPI PCI bus> on pcib10
pcib11: <ACPI PCI-PCI bridge> at device 22.0 on pci0
pci11: <ACPI PCI bus> on pcib11
vmx1: <VMware VMXNET3 Ethernet Adapter> port 0x5000-0x500f mem
0xfd4fc000-0xfd4fcfff,0xfd4fd000-0xfd4fdfff,0xfd4fe000-0xfd4fffff irq 19
at device 0.0 on pci11
vmx1: Ethernet address: 0a:d1:c0:01:03:c0
pcib12: <ACPI PCI-PCI bridge> at device 22.1 on pci0
pci12: <ACPI PCI bus> on pcib12
pcib13: <ACPI PCI-PCI bridge> at device 22.2 on pci0
pci13: <ACPI PCI bus> on pcib13
pcib14: <ACPI PCI-PCI bridge> at device 22.3 on pci0
pci14: <ACPI PCI bus> on pcib14
pcib15: <ACPI PCI-PCI bridge> at device 22.4 on pci0
pci15: <ACPI PCI bus> on pcib15
pcib16: <ACPI PCI-PCI bridge> at device 22.5 on pci0
pci16: <ACPI PCI bus> on pcib16
pcib17: <ACPI PCI-PCI bridge> at device 22.6 on pci0
pci17: <ACPI PCI bus> on pcib17
pcib18: <ACPI PCI-PCI bridge> at device 22.7 on pci0
pci18: <ACPI PCI bus> on pcib18
pcib19: <ACPI PCI-PCI bridge> at device 23.0 on pci0
pci19: <ACPI PCI bus> on pcib19
mpt0: <LSILogic SAS/SATA Adapter> port 0x6000-0x60ff mem
0xfd3ec000-0xfd3effff,0xfd3f0000-0xfd3fffff irq 16 at device 0.0 on pci19
mpt0: MPI Version=1.5.0.0
pcib20: <ACPI PCI-PCI bridge> at device 23.1 on pci0
pci20: <ACPI PCI bus> on pcib20
pcib21: <ACPI PCI-PCI bridge> at device 23.2 on pci0
pci21: <ACPI PCI bus> on pcib21
pcib22: <ACPI PCI-PCI bridge> at device 23.3 on pci0
pci22: <ACPI PCI bus> on pcib22
pcib23: <ACPI PCI-PCI bridge> at device 23.4 on pci0
pci23: <ACPI PCI bus> on pcib23
pcib24: <ACPI PCI-PCI bridge> at device 23.5 on pci0
pci24: <ACPI PCI bus> on pcib24
pcib25: <ACPI PCI-PCI bridge> at device 23.6 on pci0
pci25: <ACPI PCI bus> on pcib25
pcib26: <ACPI PCI-PCI bridge> at device 23.7 on pci0
pci26: <ACPI PCI bus> on pcib26
pcib27: <ACPI PCI-PCI bridge> at device 24.0 on pci0
pci27: <ACPI PCI bus> on pcib27
vmx2: <VMware VMXNET3 Ethernet Adapter> port 0x7000-0x700f mem
0xfd2fc000-0xfd2fcfff,0xfd2fd000-0xfd2fdfff,0xfd2fe000-0xfd2fffff irq 17
at device 0.0 on pci27
vmx2: Ethernet address: 00:0c:29:e5:ef:d8
pcib28: <ACPI PCI-PCI bridge> at device 24.1 on pci0
pci28: <ACPI PCI bus> on pcib28
pcib29: <ACPI PCI-PCI bridge> at device 24.2 on pci0
pci29: <ACPI PCI bus> on pcib29
pcib30: <ACPI PCI-PCI bridge> at device 24.3 on pci0
pci30: <ACPI PCI bus> on pcib30
pcib31: <ACPI PCI-PCI bridge> at device 24.4 on pci0
pci31: <ACPI PCI bus> on pcib31
pcib32: <ACPI PCI-PCI bridge> at device 24.5 on pci0
pci32: <ACPI PCI bus> on pcib32
pcib33: <ACPI PCI-PCI bridge> at device 24.6 on pci0
pci33: <ACPI PCI bus> on pcib33
pcib34: <ACPI PCI-PCI bridge> at device 24.7 on pci0
pci34: <ACPI PCI bus> on pcib34
acpi_acad0: <AC Adapter> on acpi0
atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0
atkbd0: <AT Keyboard> irq 1 on atkbdc0
kbd0 at atkbd0
atkbd0: [GIANT-LOCKED]
psm0: <PS/2 Mouse> irq 12 on atkbdc0
psm0: [GIANT-LOCKED]
psm0: model IntelliMouse, device ID 3
orm0: <ISA Option ROMs> at iomem
0xc0000-0xc7fff,0xc8000-0xc8fff,0xc9000-0xc9fff,0xca000-0xcbfff,0xcc000-0xccfff,0xdc000-0xdffff,0xe0000-0xe7fff
on isa0
sc0: <System console> at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=0x300>
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
ppc0: cannot reserve I/O port range
acpi_throttle0: <ACPI CPU Throttling> on cpu0
Timecounters tick every 1.000 msec
ipfw2 (+ipv6) initialized, divert loadable, nat loadable, default to
deny, logging disabled
random: unblocking device.
da0 at mpt0 bus 0 scbus2 target 0 lun 0
da0: <VMware Virtual disk 1.0> Fixed Direct Access SCSI-2 device
da0: 300.000MB/s transfers
da0: Command Queueing enabled
da0: 40960MB (83886080 512 byte sectors: 255H 63S/T 5221C)
da0: quirks=0x40<RETRY_BUSY>
Timecounter "TSC-low" frequency 1749998500 Hz quality 1000
Trying to mount root from ufs:/dev/da0p3 [rw]...
WARNING: / was not properly dismounted
ZFS NOTICE: Prefetch is disabled by default if less than 4GB of RAM is
present;
to enable, add "vfs.zfs.prefetch_disable=0" to
/boot/loader.conf.
ZFS filesystem version: 5
ZFS storage pool version: features support (5000)
More information about the freebsd-hackers
mailing list