freebsd-5.4-stable panics

Rob Watt rob at hudson-trading.com
Tue Sep 27 12:12:44 PDT 2005


On Sun, 25 Sep 2005, Robert Watson wrote:

>
> On Fri, 23 Sep 2005, Jason Carroll wrote:
> 5B
> > There seem to be 2 types of crashes we see with pretty different stack
> > traces.  What I'll call a type 1 crash, I believe, is often caused by
> > one of the triggers I mention above.  A type 2 crash appears to happen
> > spontaneously after the machine has been running for a while.
> >
> > I poked around using kgdb in a core file from a type 2 crash, and it
> > appeared the system hung closing sockets (specifically cleaning up
> > multicast state i think) while cleaning up one of our multicast
> > applications (note the trace through sys_exit).  There's no reason this
> > application should have been exiting unless it encountered some kind of
> > error.
>
> Sounds nasty.  It's possible the two panics are related, especially if
> they involve a race in the multicast code, which could result in treading
> on other kernel memory, potentially leading to the thread related panic.
> My leaning would be that they are unrelated, but since we may be able to
> eliminate the multicast one (see below), that would be a good starting
> point.
>
> There are some other known stability nits in 6.x which are being worked
> on, but in general the network stack stability is higher in 6.x than 5.x
> when it comes to multicast due to the work I reference above.  If you run
> into any stability problems relating to the file system, set
> debug.mpsafevfs=0 in loader.conf -- there are a few bug fixes relating to
> running out of disk space or hitting quota limits that are fixed in HEAD,
> but not yet backported to 6.x.

Robert,

Thanks for your quick response and suggestions. We have now experienced
an additional type of crash. Type 3 is from 6.0-BETA5, it did not enter
the debugger at all and we could not generate a core.

Unfortunately the 6-BETA crash was completely different from everything
we've seen so far. The panic was related to a page fault and 'top' was the
active process. We are trying again to run our tests on 6.0, but if we
keep encountering other bugs, then those other bugs may prevent us
from determining if multicast is the problem.

We also ran our applications in 5-STABLE without reading from or writing
to disk (ie we ran the multicast data streams on a remote machine, and we
told our listener/rebroadcaster apps not to write to disk). In this
configuration we were able to run for 4 days without crashing. A few
hours before the crash we had introduced disk activity (bonnie
in a constant loop with 1G test file size). This crash was a type 1,
and we were not able to save a core. The longest we had gone before
without a crash was 6 hours, so it is possible that either load, or disk
activity help trigger the bugs we have seen.

files attached:
kernel-conf.txt (6.0 kernel)
type3-core.txt (copy of panic output to console)

We will update you with more info from our 6.0 tests when we have it.

We are in a bind right now. All modern hardware (ie emt64/amd64) only
seems to work with versions of freebsd that aren't stable when running our
applications. Many vendors do not even sell server hardware that is purely
i386. We never encountered these types of problems on freebsd 4.x, and
many of our 120+ i386 class machines that are running 4.x are showing
their age and need to be replaced. Assuming that the problems we are
experiencing are purely related to ths OS, we now don't have an OS to run
on the newer hardware we've been buying. We really need to find a way to
patch these problems or find a version of freebsd that supports our
platform and is stable. Obviously we appreciate the hard work that all of
you on the freebsd team do, and we are happy to do whatever we can to help
squash these bugs.

-
Rob Watt
-------------- next part --------------
#
# GENERIC -- Generic kernel configuration file for FreeBSD/amd64
#
# For more information on this file, please read the handbook section on
# Kernel Configuration Files:
#
#    http://www.FreeBSD.org/doc/en_US.ISO8859-1/books/handbook/kernelconfig-config.html
#
# The handbook is also available locally in /usr/share/doc/handbook
# if you've installed the doc distribution, otherwise always see the
# FreeBSD World Wide Web server (http://www.FreeBSD.org/) for the
# latest information.
#
# An exhaustive list of options and more detailed explanations of the
# device lines is also present in the ../../conf/NOTES and NOTES files.
# If you are in doubt as to the purpose or necessity of a line, check first
# in NOTES.
#
# $FreeBSD: src/sys/amd64/conf/GENERIC,v 1.421.2.11.2.1 2005/04/09 17:28:37 kensmith Exp $

machine		amd64
cpu		HAMMER
ident		CUSTOM

# To statically compile in device wiring instead of /boot/device.hints
#hints		"GENERIC.hints"		# Default places to look for devices.
makeoptions     DEBUG=-g
options         KDB
options         DDB
options         BREAK_TO_DEBUGGER
options         INVARIANTS
options         INVARIANT_SUPPORT
options         WITNESS
options         WITNESS_SKIPSPIN
#makeoptions     COPTFLAGS="-O -frename-registers -pipe"

#options        SCHED_ULE               # ULE scheduler
options 	SCHED_4BSD		# 4BSD scheduler
options 	INET			# InterNETworking
options 	INET6			# IPv6 communications protocols
options 	FFS			# Berkeley Fast Filesystem
options 	SOFTUPDATES		# Enable FFS soft updates support
options 	UFS_ACL			# Support for access control lists
options 	UFS_DIRHASH		# Improve performance on big directories
options 	MD_ROOT			# MD is a potential root device
options 	NFSCLIENT		# Network Filesystem Client
options 	NFSSERVER		# Network Filesystem Server
options 	NFS_ROOT		# NFS usable as /, requires NFSCLIENT
options 	NTFS			# NT File System
options 	MSDOSFS			# MSDOS Filesystem
options 	CD9660			# ISO 9660 Filesystem
options 	PROCFS			# Process filesystem (requires PSEUDOFS)
options 	PSEUDOFS		# Pseudo-filesystem framework
options 	GEOM_GPT		# GUID Partition Tables.
options 	COMPAT_43		# Needed by COMPAT_LINUX32
options 	COMPAT_IA32		# Compatible with i386 binaries
options 	COMPAT_FREEBSD4		# Compatible with FreeBSD4
options         COMPAT_FREEBSD5         # Compatible with FreeBSD5
options 	COMPAT_LINUX32		# Compatible with i386 linux binaries 
options 	SCSI_DELAY=5000	        # Delay (in ms) before probing SCSI
options 	KTRACE			# ktrace(1) support
options 	SYSVSHM			# SYSV-style shared memory
options 	SYSVMSG			# SYSV-style message queues
options 	SYSVSEM			# SYSV-style semaphores
options 	_KPOSIX_PRIORITY_SCHEDULING # POSIX P1003_1B real-time extensions
options 	KBD_INSTALL_CDEV	# install a CDEV entry in /dev
options 	AHC_REG_PRETTY_PRINT	# Print register bitfields in debug
					# output.  Adds ~128k to driver.
options 	AHD_REG_PRETTY_PRINT	# Print register bitfields in debug
					# output.  Adds ~215k to driver.
options 	ADAPTIVE_GIANT		# Giant mutex is adaptive.
options         PREEMPTION              # Enable kernel thread preemption


options 	SMP

# Workarounds for some known-to-be-broken chipsets (nVidia nForce3-Pro150)
device		atpic		# 8259A compatability

# Enabling NO_MIXED_MODE gives a performance improvement on some motherboards
# but does not work with some boards (mostly nVidia chipset based).
#options 	NO_MIXED_MODE	# Don't penalize working chipsets

# Linux 32-bit ABI support
options 	LINPROCFS		# Cannot be a module yet.

# Bus support.  Do not remove isa, even if you have no isa slots
device		acpi
device		isa
device		pci

# Floppy drives
device		fdc

# ATA and ATAPI devices
device		ata
device		atadisk		# ATA disk drives
device		ataraid		# ATA RAID drives
device		atapicd		# ATAPI CDROM drives
device		atapifd		# ATAPI floppy drives
device		atapist		# ATAPI tape drives
options 	ATA_STATIC_ID	# Static device numbering

# SCSI Controllers
device		ahc		# AHA2940 and onboard AIC7xxx devices
device		ahd		# AHA39320/29320 and onboard AIC79xx devices
#device		amd		# AMD 53C974 (Tekram DC-390(T))
#device		isp		# Qlogic family
#device 	ispfw		# Firmware for QLogic HBAs- normally a module
#device		mpt		# LSI-Logic MPT-Fusion
#device		ncr		# NCR/Symbios Logic
#device		sym		# NCR/Symbios Logic (newer chipsets + those of `ncr')
#device		trm		# Tekram DC395U/UW/F DC315U adapters

#device		adv		# Advansys SCSI adapters
#device		adw		# Advansys wide SCSI adapters
device		aic		# Adaptec 15[012]x SCSI adapters, AIC-6[23]60.
#device		bt		# Buslogic/Mylex MultiMaster SCSI adapters


# SCSI peripherals
device		scbus		# SCSI bus (required for SCSI)
device		ch		# SCSI media changers
device		da		# Direct Access (disks)
device		sa		# Sequential Access (tape etc)
device		cd		# CD
device		pass		# Passthrough device (direct SCSI access)
device		ses		# SCSI Environmental Services (and SAF-TE)

# RAID controllers interfaced to the SCSI subsystem
#device		amr		# AMI MegaRAID
#device		arcmsr		# Areca SATA II RAID
#device		ciss		# Compaq Smart RAID 5*
#device		dpt		# DPT Smartcache III, IV - See NOTES for options
#device		iir		# Intel Integrated RAID
#device		ips		# IBM (Adaptec) ServeRAID
#device		mly		# Mylex AcceleRAID/eXtremeRAID
#device		twa		# 3ware 9000 series PATA/SATA RAID

# RAID controllers
device		aac		# Adaptec FSA RAID
device		aacp		# SCSI passthrough for aac (requires CAM)
#device		ida		# Compaq Smart RAID
#device		mlx		# Mylex DAC960 family
#XXX pointer/int warnings
#device		pst		# Promise Supertrak SX6000
#device		twe		# 3ware ATA RAID

# atkbdc0 controls both the keyboard and the PS/2 mouse
device		atkbdc		# AT keyboard controller
device		atkbd		# AT keyboard
device		psm		# PS/2 mouse

device		vga		# VGA video card driver

device		splash		# Splash screen and screen saver support

# syscons is the default console driver, resembling an SCO console
device		sc

# PCCARD (PCMCIA) support
# PCMCIA and cardbus bridge support
#device		cbb		# cardbus (yenta) bridge
#device		pccard		# PC Card (16-bit) bus
#device		cardbus		# CardBus (32-bit) bus

# Serial (COM) ports
device		sio		# 8250, 16[45]50 based serial ports

# Parallel port
device		ppc
device		ppbus		# Parallel port bus (required)
device		lpt		# Printer
#device		plip		# TCP/IP over parallel
device		ppi		# Parallel port interface device
#device		vpo		# Requires scbus and da

# If you've got a "dumb" serial or parallel PCI card that is
# supported by the puc(4) glue driver, uncomment the following
# line to enable it (connects to the sio and/or ppc drivers):
#device		puc

# PCI Ethernet NICs.
#device		de		# DEC/Intel DC21x4x (``Tulip'')
device		em		# Intel PRO/1000 adapter Gigabit Ethernet Card
#device		ixgb		# Intel PRO/10GbE Ethernet Card
#device		txp		# 3Com 3cR990 (``Typhoon'')
#device		vx		# 3Com 3c590, 3c595 (``Vortex'')

# PCI Ethernet NICs that use the common MII bus controller code.
# NOTE: Be sure to keep the 'device miibus' line in order to use these NICs!
device		miibus		# MII bus support
#device		bfe		# Broadcom BCM440x 10/100 Ethernet
device		bge		# Broadcom BCM570xx Gigabit Ethernet
#device		dc		# DEC/Intel 21143 and various workalikes
device		fxp		# Intel EtherExpress PRO/100B (82557, 82558)
#device		lge		# Level 1 LXT1001 gigabit Ethernet
#device		nge		# NatSemi DP83820 gigabit Ethernet
#device		pcn		# AMD Am79C97x PCI 10/100 (precedence over 'lnc')
#device		re		# RealTek 8139C+/8169/8169S/8110S
#device		rl		# RealTek 8129/8139
#device		sf		# Adaptec AIC-6915 (``Starfire'')
#device		sis		# Silicon Integrated Systems SiS 900/SiS 7016
#device		sk		# SysKonnect SK-984x & SK-982x gigabit Ethernet
#device		ste		# Sundance ST201 (D-Link DFE-550TX)
#device		ti		# Alteon Networks Tigon I/II gigabit Ethernet
#device		tl		# Texas Instruments ThunderLAN
#device		tx		# SMC EtherPower II (83c170 ``EPIC'')
#device		vge		# VIA VT612x gigabit Ethernet
#device		vr		# VIA Rhine, Rhine II
#device		wb		# Winbond W89C840F
#device		xl		# 3Com 3c90x (``Boomerang'', ``Cyclone'')

# ISA Ethernet NICs.  pccard NICs included.
#device		cs		# Crystal Semiconductor CS89x0 NIC
# 'device ed' requires 'device miibus'
# XXX kvtop brokenness, pointer/int warnings
#device		ed		# NE[12]000, SMC Ultra, 3c503, DS8390 cards
#device		ex		# Intel EtherExpress Pro/10 and Pro/10+
#device		ep		# Etherlink III based cards
#device		fe		# Fujitsu MB8696x based cards
# XXX kvtop brokenness, pointer/int warnings
#device		lnc		# NE2100, NE32-VL Lance Ethernet cards
#device		sn		# SMC's 9000 series of Ethernet chips
#device		xe		# Xircom pccard Ethernet

# Wireless NIC cards
#device		wlan		# 802.11 support
#device		an		# Aironet 4500/4800 802.11 wireless NICs.
#device		awi		# BayStack 660 and others
#device		wi		# WaveLAN/Intersil/Symbol 802.11 wireless NICs.

# Pseudo devices.
device		loop		# Network loopback
device		mem		# Memory and kernel memory devices
device		io		# I/O device
device		random		# Entropy device
device		ether		# Ethernet support
device		sl		# Kernel SLIP
device		ppp		# Kernel PPP
device		tun		# Packet tunnel.
device		pty		# Pseudo-ttys (telnet etc)
device		md		# Memory "disks"
device		gif		# IPv6 and IPv4 tunneling
device		faith		# IPv6-to-IPv4 relaying (translation)

# The `bpf' device enables the Berkeley Packet Filter.
# Be aware of the administrative consequences of enabling this!
# Note that 'bpf' is required for DHCP.
device		bpf		# Berkeley packet filter

# USB support
device		uhci		# UHCI PCI->USB interface
device		ohci		# OHCI PCI->USB interface
#device		ehci		# EHCI PCI->USB interface (USB 2.0)
device		usb		# USB Bus (required)
#device		udbp		# USB Double Bulk Pipe devices
device		ugen		# Generic
device		uhid		# "Human Interface Devices"
device		ukbd		# Keyboard
device		ulpt		# Printer
device		umass		# Disks/Mass storage - Requires scbus and da
device		ums		# Mouse
#device		urio		# Diamond Rio 500 MP3 player
#device		uscanner	# Scanners
# USB Ethernet, requires mii
#device		aue		# ADMtek USB Ethernet
#device		axe		# ASIX Electronics USB Ethernet
#device		cdce		# Generic USB over Ethernet
#device		cue		# CATC USB Ethernet
#device		kue		# Kawasaki LSI USB Ethernet
#device		rue		# RealTek RTL8150 USB Ethernet

# FireWire support
#device		firewire	# FireWire bus code
#device		sbp		# SCSI over FireWire (Requires scbus and da)
#device		fwe		# Ethernet over FireWire (non-standard!)

options         IPFIREWALL
options         IPFIREWALL_VERBOSE

-------------- next part --------------
kernel trap 12 with interrupts disabled

fatal trap 12: page fault while in kernel mode
cpuid=3; apicid=03
fault virtual address   = 03
fault code              = supervisor read, page not present
instruction pointer     = 0x8:ffffffff803b88ca
stack pointer           = 0x10:ffffffffb6639490
frame pointer           = 0x10:ffffffffb66394f0
code segment            = base 0x0; limit 0xfffff, type=0x1b
                        = DPL=0, pres 1, long 1, def32=0, gran 1
processor eflags        = resume, IOPL=0
current process         = 48628 (top)

did not enter DDB or generate core file


More information about the freebsd-amd64 mailing list