Repeated msgs & kernel panic w/ r246437 (Revamp the CAM enclosure services driver)
John
jwd at freebsd.org
Sat Jun 1 03:25:08 UTC 2013
----- John's Original Message -----
> ----- Kenneth D. Merry's Original Message -----
> > > On 22.04.2013 06:00, John wrote:
> > > >Hi Folks,
> > > >
> > > > After updating one of our servers to the latest stable image,
> > > >it appears that commit r246437 appears to be causing it to panic.
>
> > I agree. I added the xpt_create_path_unlocked() call to fix a
> > panic with a stack trace just like the one above. It looks like a problem
> > due to running r246437 exactly.
>
> As noted above, the system panics with the latest stable image. We
> simply started backing off commits until we found where the system
> no longer panic'd.
>
> Side note: is there a way to disable the new daemon that I haven't
> seen? sysctl? loader config?
>
> Apologies for the delay - here is the stable information:
>
> 'working' tag - has 1 of the external LSI cards disabled (so only 1 path through the shelves).
> 'broken' tag - has both cards enabled and panics in short order.
>
> Both of these are with a stable build (r249895)
>
> http://www.freebsd.org/~jwd/r246437/dmesg.working.txt
> http://www.freebsd.org/~jwd/r246437/dmesg.broken.txt
Hi Folks,
I had a little time to look at this again.. I enabled WITNESS
on a 9-stable system from 5/30/2013. It ran for a short while before
panic'ing.
ses40: ses0,pass20: Element descriptor: ' '
ses40: ses0,pass20: SAS Expander: 24 Physses40: phy 0: connector 255 other 255
ses40: phy 1: connector 255 other 255
ses40: phy 2: connector 255 other 255
ses40: phy 3: connector 255 other 255
ses40: phy 4: connector 255 other 255
ses40: phy 5: connector 255 other 255
ses40: phy 6: connector 255 other 255
ses40: phy 7: connector 255 other 255
ses40: phy 8: connector 255 other 255
ses40: phy 9: connector 255 other 255
ses40: phy 10: connector 255 other 255
ses40: phy 11: connector 255 other 255
ses40: phy 12: connector 255 other 255
ses40: phy 13: connector 255 other 255
ses40: phy 14: connector 255 other 255
ses40: phy 15: connector 255 other 255
ses40: phy 16: connector 255 other 255
ses40: phy 17: connector 255 other 255
ses40: phy 18: connector 255 other 255
ses40: phy 19: connector 255 other 255
ses40: phy 20: connector 255 other 255
ses40: phy 21: connector 255 other 255
ses40: phy 22: connector 255 other 255
ses40: phy 23: connector 255 other 255
Kernel page fault with the following non-sleepable locks held:
exclusive sleep mutex MPT2SAS lock (MPT2SAS lock) r = 0 (0xffffff8cce0161b8) locked @ /usr/src.2013-05-30_18.01.06/sys/cam/cam_periph.h:192
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2a/frame 0xffffffa42c96c480
kdb_backtrace() at kdb_backtrace+0x37/frame 0xffffffa42c96c540
_witness_debugger() at _witness_debugger+0x2c/frame 0xffffffa42c96c560
witness_warn() at witness_warn+0x2d2/frame 0xffffffa42c96c6a0
trap_pfault() at trap_pfault+0x6a/frame 0xffffffa42c96c730
trap() at trap+0x42a/frame 0xffffffa42c96c930
calltrap() at calltrap+0x8/frame 0xffffffa42c96c930
--- trap 0xc, rip = 0xffffffff80cc69e8, rsp = 0xffffffa42c96c9f0, rbp = 0xffffffa42c96ca20 ---
memcpy() at memcpy+0x8/frame 0xffffffa42c96ca20
xpt_getattr() at xpt_getattr+0x10f/frame 0xffffffa42c96cb30
pass_add_physpath() at pass_add_physpath+0x6d/frame 0xffffffa42c96cb60
taskqueue_run_locked() at taskqueue_run_locked+0x93/frame 0xffffffa42c96cbc0
taskqueue_thread_loop() at taskqueue_thread_loop+0x3e/frame 0xffffffa42c96cbe0
fork_exit() at fork_exit+0x135/frame 0xffffffa42c96cc30
fork_trampoline() at fork_trampoline+0xe/frame 0xffffffa42c96cc30
--- trap 0, rip = 0, rsp = 0xffffffa42c96ccf0, rbp = 0 ---
Fatal trap 12: page fault while in kernel mode
cpuid = 4; apic id = 12
fault virtual address = 0x0
fault code = supervisor read data, page not present
instruction pointer = 0x20:0xffffffff80cc69e8
stack pointer = 0x28:0xffffffa42c96c9f0
frame pointer = 0x28:0xffffffa42c96ca20
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 0 (thread taskq)
lock order reversal: (Giant after non-sleepable)
lock order reversal: (Giant after non-sleepable)
1st 0xffffff8cce0161b8 MPT2SAS lock (MPT2SAS lock) @ /usr/src.2013-05-30_18.01.06/sys/cam/cam_periph.h:192
1st 0xffffff8cce0161b8 MPT2SAS lock (MPT2SAS lock) @ /usr/src.2013-05-30_18.01.06/sys/cam/cam_periph.h:192
2nd 0xffffffff813d1760 Giant (Giant) @ /usr/src.2013-05-30_18.01.06/sys/dev/usb/input/ukbd.c:1942
2nd 0xffffffff813d1760 Giant (Giant) @ /usr/src.2013-05-30_18.01.06/sys/dev/usb/input/ukbd.c:1942
KDB: stack backtrace:
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2a/frame 0xffffffa42c96c3b0
kdb_backtrace() at kdb_backtrace+0x37/frame 0xffffffa42c96c470
_witness_debugger() at _witness_debugger+0x2c/frame 0xffffffa42c96c490
witness_checkorder() at witness_checkorder+0x875/frame 0xffffffa42c96c550
_mtx_lock_flags() at _mtx_lock_flags+0x80/frame 0xffffffa42c96c580
ukbd_poll() at ukbd_poll+0x27/frame 0xffffffa42c96c5a0
kbdmux_poll() at kbdmux_poll+0x3f/frame 0xffffffa42c96c5c0
cngrab() at cngrab+0x2f/frame 0xffffffa42c96c5e0
kdb_trap() at kdb_trap+0xc6/frame 0xffffffa42c96c640
trap_fatal() at trap_fatal+0x29d/frame 0xffffffa42c96c6a0
trap_pfault() at trap_pfault+0x7a/frame 0xffffffa42c96c730
trap() at trap+0x42a/frame 0xffffffa42c96c930
calltrap() at calltrap+0x8/frame 0xffffffa42c96c930
--- trap 0xc, rip = 0xffffffff80cc69e8, rsp = 0xffffffa42c96c9f0, rbp = 0xffffffa42c96ca20 ---
memcpy() at memcpy+0x8/frame 0xffffffa42c96ca20
xpt_getattr() at xpt_getattr+0x10f/frame 0xffffffa42c96cb30
pass_add_physpath() at pass_add_physpath+0x6d/frame 0xffffffa42c96cb60
taskqueue_run_locked() at taskqueue_run_locked+0x93/frame 0xffffffa42c96cbc0
taskqueue_thread_loop() at taskqueue_thread_loop+0x3e/frame 0xffffffa42c96cbe0
fork_exit() at fork_exit+0x135/frame 0xffffffa42c96cc30
fork_trampoline() at fork_trampoline+0xe/frame 0xffffffa42c96cc30
--- trap 0, rip = 0, rsp = 0xffffffa42c96ccf0, rbp = 0 ---
[ thread pid 0 tid 100034 ]
Stopped at memcpy+0x8: repe movsq (%rsi),%es:(%rdi)
db> bt
Tracing pid 0 tid 100034 td 0xfffffe002c98c920
memcpy() at memcpy+0x8/frame 0xffffffa42c96ca20
xpt_getattr() at xpt_getattr+0x10f/frame 0xffffffa42c96cb30
pass_add_physpath() at pass_add_physpath+0x6d/frame 0xffffffa42c96cb60
taskqueue_run_locked() at taskqueue_run_locked+0x93/frame 0xffffffa42c96cbc0
taskqueue_thread_loop() at taskqueue_thread_loop+0x3e/frame 0xffffffa42c96cbe0
fork_exit() at fork_exit+0x135/frame 0xffffffa42c96cc30
fork_trampoline() at fork_trampoline+0xe/frame 0xffffffa42c96cc30
--- trap 0, rip = 0, rsp = 0xffffffa42c96ccf0, rbp = 0 ---
db>
Bits and pieces... 48 enclosure daemons running:
Tracing command enc_daemon47 pid 64 tid 100160 td 0xfffffe009dd28490
sched_switch() at sched_switch+0x194/frame 0xffffffa70faa4a40
mi_switch() at mi_switch+0x208/frame 0xffffffa70faa4a90
sleepq_switch() at sleepq_switch+0xfc/frame 0xffffffa70faa4ac0
sleepq_wait() at sleepq_wait+0x4d/frame 0xffffffa70faa4af0
_sleep() at _sleep+0x3d4/frame 0xffffffa70faa4b80
enc_daemon() at enc_daemon+0xde/frame 0xffffffa70faa4be0
fork_exit() at fork_exit+0x135/frame 0xffffffa70faa4c30
fork_trampoline() at fork_trampoline+0xe/frame 0xffffffa70faa4c30
--- trap 0, rip = 0, rsp = 0xffffffa70faa4cf0, rbp = 0 ---
And it looks like there are some reversals within the daemon:
acquiring duplicate lock of same type: "MPT2SAS lock"
1st MPT2SAS lock @ /usr/src.2013-05-30_18.01.06/sys/cam/cam_periph.h:192
2nd MPT2SAS lock @ /usr/src.2013-05-30_18.01.06/sys/cam/scsi/scsi_xpt.c:2529
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2a/frame 0xffffffa70fa4a010
ses27: phy 0: SAS device type 1 id 1
ses27: phy 0: protocols: Initiator( None ) Target( SSP )
ses27: phy 0: parent 50014380019fa87d addr 5000c500345510be
ses26: phy 0: SAS device type 1 id 1
ses26: phy 0: protocols: Initiator( None ) Target( SSP )
ses26: phy 0: parent 50014380019f20bd addr 5000c50034642cb2
ses28: phy 0: parent 50014380019f787d addr 5000c50034635216
kdb_backtrace() at kdb_backtrace+0x37/frame 0xffffffa70fa4a0d0
_witness_debugger() at _witness_debugger+0x2c/frame 0xffffffa70fa4a0f0
witness_checkorder() at witness_checkorder+0x95e/frame 0xffffffa70fa4a1b0
_mtx_lock_flags() at _mtx_lock_flags+0x80/frame 0xffffffa70fa4a1e0
scsi_action() at scsi_action+0x1d1/frame 0xffffffa70fa4a200
ses_setphyspath_callback() at ses_setphyspath_callback+0x177/frame 0xffffffa70fa4a310
ses_path_iter_devid_callback() at ses_path_iter_devid_callback+0x1c6/frame 0xffffffa70fa4a8b0
ses_devids_iter() at ses_devids_iter+0xb1/frame 0xffffffa70fa4a930
ses_paths_iter() at ses_paths_iter+0x20/frame 0xffffffa70fa4a950
ses_publish_physpaths() at ses_publish_physpaths+0x263/frame 0xffffffa70fa4ab80
enc_daemon() at enc_daemon+0x2a4/frame 0xffffffa70fa4abe0
fork_exit() at fork_exit+0x135/frame 0xffffffa70fa4ac30
fork_trampoline() at fork_trampoline+0xe/frame 0xffffffa70fa4ac30
--- trap 0, rip = 0, rsp = 0xffffffa70fa4acf0, rbp = 0 ---
Cheers,
John
More information about the freebsd-scsi
mailing list