7.1 Panic on degraded disk w/mpt
Charles Sprickman
spork at bway.net
Mon Feb 9 23:14:15 PST 2009
On Tue, 10 Feb 2009, Charles Sprickman wrote:
> On Mon, 9 Feb 2009, Scott Long wrote:
>
>> Charles Sprickman wrote:
>>> (posted on -stable already, no takers - added info: full dmesg, crash info
>>> from panic when array finished rebuilding, some comments on dmesg)
>>>
>>> Howdy,
>>>
>>> I dug around and can't find a PR on this, and the only other report I saw
>>> was in this mailing list post that has no replies:
>>>
>>> http://www.nabble.com/7.1-BETA2-panic-on-mpt-degrade-td20183173.html
>>>
>>> The hardware is a Dell PowerEdge 860 with the Dell/LSI SAS5 controller:
>>>
>>> mpt0: <LSILogic SAS/SATA Adapter> port 0xec00-0xecff mem
>>> 0xfe9fc000-0xfe9fffff,0xfe9e0000-0xfe9effff irq 16 at device 8.0 on pci2
>>> mpt0: MPI Version=1.5.13.0
>>>
>>> The panic is repeatable by forcing the array into a degraded state. When
>>> the array finishes rebuilding, the box also panics.
>>>
>>> Here's my best shot at getting info out of kgdb (panic on array going to
>>> degraded state):
>>
>> I wonder if the MPT card is temporarily detaching and then reattaching
>> the logical drive when the rebuild completes.
>
> IIRC, just before the panic there is a bunch of CAM debug splattered across
> the monitor. I can run down to the garage and snap a few pics of the monitor
> after detaching a drive.
OK, some more info here. I wanted to be safe, so I brought the machine
down to single user and unmounted everything but /. It did not panic on
the drive being removed. So perhaps a quiet filesystem = no panic.
Here's what gets spit out on the console:
mpt0: mpt_cam_event: 0x16
mpt0: Unhandled Event Notify Frame. Event 0x16 (ACK not required).
mpt0: mpt_cam_event: 0x12
mpt0: Unhandled Event Notify Frame. Event 0x12 (ACK not required).
mpt0: mpt_cam_event: 0x16
mpt0: Unhandled Event Notify Frame. Event 0x16 (ACK not required).
(mpt0:vol0:1): Physical Disk Status Changed
mpt0: mpt_cam_event: 0x15
mpt0: Unhandled Event Notify Frame. Event 0x15 (ACK not required).
mpt0: mpt_cam_event: 0x21
mpt0: Unhandled Event Notify Frame. Event 0x21 (ACK not required).
(mpt0:vol0:1): Physical Disk Status Changed
mpt0:vol0(mpt0:0:0): Volume Status Changed
mpt0: mpt_cam_event: 0x15
mpt0: Unhandled Event Notify Frame. Event 0x15 (ACK not required).
mpt0: mpt_cam_event: 0x21
mpt0: Unhandled Event Notify Frame. Event 0x21 (ACK not required).
mpt0: mpt_cam_event: 0x15
mpt0: Unhandled Event Notify Frame. Event 0x15 (ACK not required).
mpt0: mpt_cam_event: 0x21
mpt0: Unhandled Event Notify Frame. Event 0x21 (ACK not required).
mpt0:vol0(mpt0:0:0): RAID-1 - Degraded
mpt0:vol0(mpt0:0:0): Status ( Enabled )
(mpt0:vol0:1): No longer configured
(probe0:mpt0:1:0:0): error 22
(probe0:mpt0:1:0:0): Unretryable Error
(probe2:mpt0:1:2:0): error 22
(probe2:mpt0:1:2:0): Unretryable Error
(probe3:mpt0:1:3:0): error 22
(repeats with probe # increasing...)
(probe1:mpt0:1:1:0): CAM Status 0x19
(probe1:mpt0:1:1:0): Retrying Command
(probe0:mpt0:1:0:0): error 22
(probe0:mpt0:1:0:0): Unretryable Error
(pass1:mpt0:1:0:0): lost device
(pass1:mpt0:1:0:0): removing device entry
So it does appear that at the very least the mpt driver is removing the
pass device for that drive, right?
And on reattach:
mpt0: mpt_cam_event: 0x16
mpt0: Unhandled Event Notify Frame. Event 0x16 (ACK not required).
mpt0: mpt_cam_event: 0x12
mpt0: Unhandled Event Notify Frame. Event 0x12 (ACK not required).
mpt0: mpt_cam_event: 0x16
mpt0: Unhandled Event Notify Frame. Event 0x16 (ACK not required).
mpt0: Volume(0:1:0): Physical Disk Status Changed
mpt0: mpt_cam_event: 0x15
mpt0: Unhandled Event Notify Frame. Event 0x15 (ACK not required).
mpt0: mpt_cam_event: 0x21
mpt0: Unhandled Event Notify Frame. Event 0x21 (ACK not required).
(mpt0:vol0:1): Physical (mpt0:0:1:0), Pass-thru (mpt0:1:0:0)
(mpt0:vol0:1): Online
(mpt0:vol0:1): Status ( Out-Of-Sync )
(probe2:mpt0:1:2:0): error 22
(probe2:mpt0:1:2:0): Unretryable Error
(probe3:mpt0:1:3:0): error 22
(rinse, repeat)
pass1 at mpt0 bus 1 target 0 lun 0
pass1: <ATA ST3750640NS G> Fixed unknown SCSI-5 device
pass1: Serial Number 5QD56ZXC
pass1: 300.000MB/s transfers
pass1: Command Queueing Enabled
mpt0: mpt_cam_event: 0x15
mpt0: Unhandled Event Notify Frame. Event 0x15 (ACK not required).
mpt0: mpt_cam_event: 0x21
mpt0: Unhandled Event Notify Frame. Event 0x21 (ACK not required).
mpt0: mpt_cam_event: 0x21
mpt0: Unhandled Event Notify Frame. Event 0x21 (ACK not required).
mpt0: mpt_cam_event: 0x21
mpt0: Unhandled Event Notify Frame. Event 0x21 (ACK not required).
mpt0:vol0(mpt0:0:0): Volume Status Changed
mpt0:vol0(mpt0:0:0): RAID-1 - Degraded
mpt0:vol0(mpt0:0:0): Status ( Enabled Re-Syncing )
mpt0:vol0(mpt0:0:0): High Priority Re-Sync
mpt0:vol0(mpt0:0:0): 1464842240 of 1464842240 blocks remaining
I'm betting it will panic again in a few hours when the rebuild finishes.
I'll try the detach again tomorrow with all the filesystems mounted and
I'll make sure there's some pending writes when I detach. If I see
anything interesting before the panic message on screen, I'll grab it.
Thanks,
Charles
More information about the freebsd-scsi
mailing list