7.1 Panic on degraded disk w/mpt

Charles Sprickman spork at bway.net
Mon Feb 9 23:14:15 PST 2009


On Tue, 10 Feb 2009, Charles Sprickman wrote:

> On Mon, 9 Feb 2009, Scott Long wrote:
>
>> Charles Sprickman wrote:
>>> (posted on -stable already, no takers - added info: full dmesg, crash info 
>>> from panic when array finished rebuilding, some comments on dmesg)
>>> 
>>> Howdy,
>>> 
>>> I dug around and can't find a PR on this, and the only other report I saw 
>>> was in this mailing list post that has no replies:
>>> 
>>> http://www.nabble.com/7.1-BETA2-panic-on-mpt-degrade-td20183173.html
>>> 
>>> The hardware is a Dell PowerEdge 860 with the Dell/LSI SAS5 controller:
>>> 
>>> mpt0: <LSILogic SAS/SATA Adapter> port 0xec00-0xecff mem 
>>> 0xfe9fc000-0xfe9fffff,0xfe9e0000-0xfe9effff irq 16 at device 8.0 on pci2
>>> mpt0: MPI Version=1.5.13.0
>>> 
>>> The panic is repeatable by forcing the array into a degraded state.  When 
>>> the array finishes rebuilding, the box also panics.
>>> 
>>> Here's my best shot at getting info out of kgdb (panic on array going to 
>>> degraded state):
>> 
>> I wonder if the MPT card is temporarily detaching and then reattaching
>> the logical drive when the rebuild completes.
>
> IIRC, just before the panic there is a bunch of CAM debug splattered across 
> the monitor.  I can run down to the garage and snap a few pics of the monitor 
> after detaching a drive.

OK, some more info here.  I wanted to be safe, so I brought the machine 
down to single user and unmounted everything but /.  It did not panic on 
the drive being removed.  So perhaps a quiet filesystem = no panic.

Here's what gets spit out on the console:

mpt0: mpt_cam_event: 0x16
mpt0: Unhandled Event Notify Frame. Event 0x16 (ACK not required).
mpt0: mpt_cam_event: 0x12
mpt0: Unhandled Event Notify Frame. Event 0x12 (ACK not required).
mpt0: mpt_cam_event: 0x16
mpt0: Unhandled Event Notify Frame. Event 0x16 (ACK not required).
(mpt0:vol0:1): Physical Disk Status Changed
mpt0: mpt_cam_event: 0x15
mpt0: Unhandled Event Notify Frame. Event 0x15 (ACK not required).
mpt0: mpt_cam_event: 0x21
mpt0: Unhandled Event Notify Frame. Event 0x21 (ACK not required).
(mpt0:vol0:1): Physical Disk Status Changed
mpt0:vol0(mpt0:0:0): Volume Status Changed
mpt0: mpt_cam_event: 0x15
mpt0: Unhandled Event Notify Frame. Event 0x15 (ACK not required).
mpt0: mpt_cam_event: 0x21
mpt0: Unhandled Event Notify Frame. Event 0x21 (ACK not required).
mpt0: mpt_cam_event: 0x15
mpt0: Unhandled Event Notify Frame. Event 0x15 (ACK not required).
mpt0: mpt_cam_event: 0x21
mpt0: Unhandled Event Notify Frame. Event 0x21 (ACK not required).
mpt0:vol0(mpt0:0:0): RAID-1 - Degraded
mpt0:vol0(mpt0:0:0): Status ( Enabled )
(mpt0:vol0:1): No longer configured
(probe0:mpt0:1:0:0): error 22
(probe0:mpt0:1:0:0): Unretryable Error
(probe2:mpt0:1:2:0): error 22
(probe2:mpt0:1:2:0): Unretryable Error
(probe3:mpt0:1:3:0): error 22
(repeats with probe # increasing...)
(probe1:mpt0:1:1:0): CAM Status 0x19
(probe1:mpt0:1:1:0): Retrying Command
(probe0:mpt0:1:0:0): error 22
(probe0:mpt0:1:0:0): Unretryable Error
(pass1:mpt0:1:0:0): lost device
(pass1:mpt0:1:0:0): removing device entry

So it does appear that at the very least the mpt driver is removing the 
pass device for that drive, right?

And on reattach:

mpt0: mpt_cam_event: 0x16
mpt0: Unhandled Event Notify Frame. Event 0x16 (ACK not required).
mpt0: mpt_cam_event: 0x12
mpt0: Unhandled Event Notify Frame. Event 0x12 (ACK not required).
mpt0: mpt_cam_event: 0x16
mpt0: Unhandled Event Notify Frame. Event 0x16 (ACK not required).
mpt0: Volume(0:1:0): Physical Disk Status Changed
mpt0: mpt_cam_event: 0x15
mpt0: Unhandled Event Notify Frame. Event 0x15 (ACK not required).
mpt0: mpt_cam_event: 0x21
mpt0: Unhandled Event Notify Frame. Event 0x21 (ACK not required).
(mpt0:vol0:1): Physical (mpt0:0:1:0), Pass-thru (mpt0:1:0:0)
(mpt0:vol0:1): Online
(mpt0:vol0:1): Status ( Out-Of-Sync )
(probe2:mpt0:1:2:0): error 22
(probe2:mpt0:1:2:0): Unretryable Error
(probe3:mpt0:1:3:0): error 22
(rinse, repeat)

pass1 at mpt0 bus 1 target 0 lun 0
pass1: <ATA ST3750640NS G> Fixed unknown SCSI-5 device
pass1: Serial Number             5QD56ZXC
pass1: 300.000MB/s transfers
pass1: Command Queueing Enabled
mpt0: mpt_cam_event: 0x15
mpt0: Unhandled Event Notify Frame. Event 0x15 (ACK not required).
mpt0: mpt_cam_event: 0x21
mpt0: Unhandled Event Notify Frame. Event 0x21 (ACK not required).
mpt0: mpt_cam_event: 0x21
mpt0: Unhandled Event Notify Frame. Event 0x21 (ACK not required).
mpt0: mpt_cam_event: 0x21
mpt0: Unhandled Event Notify Frame. Event 0x21 (ACK not required).
mpt0:vol0(mpt0:0:0): Volume Status Changed
mpt0:vol0(mpt0:0:0): RAID-1 - Degraded
mpt0:vol0(mpt0:0:0): Status ( Enabled Re-Syncing )
mpt0:vol0(mpt0:0:0): High Priority Re-Sync
mpt0:vol0(mpt0:0:0): 1464842240 of 1464842240 blocks remaining

I'm betting it will panic again in a few hours when the rebuild finishes.

I'll try the detach again tomorrow with all the filesystems mounted and 
I'll make sure there's some pending writes when I detach.  If I see 
anything interesting before the panic message on screen, I'll grab it.

Thanks,

Charles


More information about the freebsd-scsi mailing list