mfi troubles (Unexpected Sense)

Sun Jul 15 22:46:31 UTC 2018

On 2018-07-15 04:27 PM, Dmitry Morozovsky wrote:
> Colleagues,
> 
> one of my servers start to expose unexpected delays possibly related to disk
> subsystems.
> 
> It's Supemicro with mfi, ZFS on set of RAID0 (yes, I know we've missed the
> right controller, but this is out of question at least for the current time)
> 
> Now kernel log is filled with messages like
> 
> mfi0: 60006 (585001405s/0x0002/info) - Unexpected sense: PD 0e(e0x08/s5) Path
> 500304800021bf31, CDB: 8f 00 00 00 00 00 14 77 5c 1e 00 00 10 00 00 00, Sense:
> 3/11/00
> 
> every few seconds

That's a SCSI VERIFY(16) command with a sense key of medium error and
additional sense of 'unrecovered read error'. [Note to FreeBSD SCSI
maintainers: how about some leading '0x' or trailing 'h' for hex numbers ??]

Translation: a disk is dying, probably associated with NAA 0x500304800021bf31.

Is there any enclosure management? A device like /dev/ses*

If so, try 'sg_ses /dev/ses<n>' and look for that NAA (or a close number to
it (within 3)). I'll assume you have an exact match.

Then try 'sg_ses -A 0x500304800021bf31 --set=ident /dev/ses<n>'

That should cause a LED to flash on the disk carrier of the damaged disk.
To stop it flashing substitute "clear" for "set" in the previous invocation.

> I tried to find the place in the source which produce these lines but failed :(
> 
> Hard reboot, including full power off, was tried, but did not help.
> 
> Any hints to diagnose this further?
> 
> Ah, and this is stable/10 from Nov 2017
> 
> 
> please keep me CC:d as I'm not subscribed to -scsi@

Good luck
Doug Gilbert

P.S. I'm currently trying to recover data from a disk whose heads got
stuck ... so I know the feeling.