bad disk discovery

Tue Dec 8 13:26:52 UTC 2015

On 2015-12-08 08:00, prateek sethi wrote:
> Hi Scott,
> Thanks for the your quick response.
> 
> I have different set of hardware . So that's why I want to know how I 
> can
> debug it myself . Is there anyway or procedure using that I can findout
> about the situation or the reason for CDB errors or disk command 
> failure?
> 
> Right now I am giving detail about the setup where I am getting this 
> issue .
> 
> I am using LSI SAS2008 controller and connected with supermicro 
> Enclosure
> with freebsd 9.3. 16 different disks are there but only one disk is 
> having
> problem. That means contoller and cable are fine.
> 
> Faulty disk info are like:-.
> 
> *smartctl output is:-*
> 
> smartctl -x /dev/da23
> 
> === START OF INFORMATION SECTION ===
> Vendor:               SEAGATE
> Product:              ST3600057SS
> Revision:             000B
> Rotation Rate:        15000 rpm
> Form Factor:          3.5 inches
> Logical Unit id:      0x5000c5007725173f
> Serial number:        6SL8YLPC0000N5030DY7
> Device type:          disk
> Transport protocol:   SAS
> Local Time is:        Tue Dec  8 18:20:45 2015 IST
> *device is NOT READY (e.g. spun down, busy)*
> 
> *Logs:-*
> 
> Dec  8 14:12:01 N1 kernel: da23 at mps0 bus 0 scbus0 target 148 lun 0
> Dec  8 14:12:01 N1 kernel: da23: <SEAGATE ST3600057SS 000B> Fixed 
> Direct
> Access SCSI-5 device
> Dec  8 14:12:01 N1 kernel: da23: Serial Number 6SL8YLPC0000N5030DY7
> Dec  8 14:12:01 N1 kernel: da23: 600.000MB/s transfers
> Dec  8 14:12:01 N1 kernel: da23: Command Queueing enabled
> Dec  8 14:12:01 N1 kernel: da23: *Attempt to query device size failed: 
> NOT
> READY, Logical unit not ready, cause n*
> Dec  8 14:12:01 N1 kernel: ses1: da23,pass26: Element descriptor: 'Slot 
> 24'
> Dec  8 14:12:01 N1 kernel: ses1: da23,pass26: SAS Device Slot Element: 
> 1
> Phys at Slot 23
> 
> *driver versions:-*
> 
> dev.mps.0.firmware_version: 15.00.00.00
> dev.mps.0.driver_version: 16.00.00.00-fbsd
> 
> 
> 
> 
> 
> 
> On Tue, Dec 8, 2015 at 3:15 AM, Scott Long <scott4long at yahoo.com> 
> wrote:
> 
>> Hi,
>> 
>> If your situation is accurate and the disk is not responding properly 
>> to
>> regular
>> commands then it’s unlikely that it will respond to SMART commands 
>> either.
>> Sometimes these situations are caused by a bad cable, bad controller, 
>> or
>> buggy software/firmware, and only rarely will the standard statistics 
>> in
>> SMART
>> pick up these kinds of errors.  SMART is better at tracking wear rates 
>> and
>> error rates on the physical media, both HDD and SSD, but even then 
>> it’s
>> hard
>> for it to be accurately predictive or even accurately diagnostic.  For
>> your case,
>> I recommend that you describe your hardware and software configuration 
>> in
>> more detail, and look for physical abnormalities in the cabling and
>> connections.
>> Once that is ruled and and the rest of us know what kind of hardware 
>> you’re
>> dealing with, we might be able to make better commendations.
>> 
>> Scott
>> 
>> > On Dec 7, 2015, at 11:07 AM, prateek sethi <prateekrootkey at gmail.com>
>> wrote:
>> >
>> > Hi ,
>> >
>> > Is there any way or tool to find out that a disk which is not responding
>> > properly is really bad or not? Sometimes I have seen that there is lot of
>> > CDB error for a drive and system reboot makes every thing fine. What can
>> be
>> > reasons for such kind of scenarios?
>> >
>> > I know smartctl is the one which can help. I have some couple of question
>> > regarding this .
>> >
>> > 1. What if disk does not support smartctl?
>> > 2. How I can do smartest use of smartctl command like which parameters
>> can
>> > tell that the disk is actually bad?
>> > 3. What other test I can perform to make it sure that disk has completely
>> > gone?
>> >
>> >
>> > Please tell me correct place to ask this question if I am asking at wrong
>> > place.
>> > _______________________________________________
>> > freebsd-scsi at freebsd.org mailing list
>> > https://lists.freebsd.org/mailman/listinfo/freebsd-scsi
>> > To unsubscribe, send any mail to "freebsd-scsi-unsubscribe at freebsd.org"
>> 
>> 
> _______________________________________________
> freebsd-scsi at freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-scsi
> To unsubscribe, send any mail to "freebsd-scsi-unsubscribe at freebsd.org"

Have you simply moved the drive to another slot - does the problem 
follow the drive?
Unlikely but it could be a backplane issue.

I don't know about version 15 firmware, I have always used version 16 
firmware
with 9.x to match the driver version.