Repeated msgs & kernel panic w/ r246437 (Revamp the CAM enclosure services driver)
John
jwd at FreeBSD.org
Mon Apr 22 03:00:53 UTC 2013
Hi Folks,
After updating one of our servers to the latest stable image,
it appears that commit r246437 appears to be causing it to panic.
The commit:
http://svnweb.freebsd.org/base?view=revision&revision=246437
What one of our servers looks like:
http://people.freebsd.org/~jwd/zfsnfsserver.jpg
The last known working commit:
http://people.freebsd.org/~jwd/r246437/dmesg.r246431.clean.txt
With commit r246437:
http://people.freebsd.org/~jwd/r246437/dmesg.r246437.log.txt
Note, most of the dmesg output is related to the ses devices. It
repeats itself multiple times before the panic.
ses39: ses0,pass20: Element descriptor: ' '
ses39: ses0,pass20: SAS Expander: 24 Physses39: phy 0: connector 255 other 255
ses39: phy 1: connector 255 other 255
ses39: phy 2: connector 255 other 255
ses39: phy 3: connector 255 other 255
ses39: phy 4: connector 255 other 255
ses39: phy 5: connector 255 other 255
ses39: phy 6: connector 255 other 255
etc, etc...
After just a few minutes, the system panics. A pair of images
of the screen (sorry, no serial console at this time):
Panic: http://people.freebsd.org/~jwd/r246437/20130419_160143.jpg
bt: http://people.freebsd.org/~jwd/r246437/20130419_110158.jpg
We are currently running a test to see if the fact that all our
shelves are dual-attached, allowing us to use geom multipath is
related. ie: we have disabled the 2nd HBA thus cutting the total
number of da & ses devices in half and thus not executing the
code in the commit that tracks duplicate ses devices.
Note, if we disable both HBA devices and boot the system up it
does not panic or print out the repeated messages, but of course
we have no disks :-)
I am unclear on the "connector 255 other 255" messages and have not
taken the time to look into them yet.
I would appreciate any insights folks can provide.
Many Thanks,
John
ps: We've had to seriously increase the console buffer size to
capture the complete dmesg output...
options MSGBUF_SIZE=(32768*32)
Can we delay starting the kernel daemon until after the system
is up and /var/log/messages is available? Just a thought...
More information about the freebsd-scsi
mailing list