bad disk discovery
Michael Jung
mikej at mikej.com
Tue Dec 8 13:26:52 UTC 2015
On 2015-12-08 08:00, prateek sethi wrote:
> Hi Scott,
> Thanks for the your quick response.
>
> I have different set of hardware . So that's why I want to know how I
> can
> debug it myself . Is there anyway or procedure using that I can findout
> about the situation or the reason for CDB errors or disk command
> failure?
>
> Right now I am giving detail about the setup where I am getting this
> issue .
>
> I am using LSI SAS2008 controller and connected with supermicro
> Enclosure
> with freebsd 9.3. 16 different disks are there but only one disk is
> having
> problem. That means contoller and cable are fine.
>
> Faulty disk info are like:-.
>
> *smartctl output is:-*
>
> smartctl -x /dev/da23
>
> === START OF INFORMATION SECTION ===
> Vendor: SEAGATE
> Product: ST3600057SS
> Revision: 000B
> Rotation Rate: 15000 rpm
> Form Factor: 3.5 inches
> Logical Unit id: 0x5000c5007725173f
> Serial number: 6SL8YLPC0000N5030DY7
> Device type: disk
> Transport protocol: SAS
> Local Time is: Tue Dec 8 18:20:45 2015 IST
> *device is NOT READY (e.g. spun down, busy)*
>
> *Logs:-*
>
> Dec 8 14:12:01 N1 kernel: da23 at mps0 bus 0 scbus0 target 148 lun 0
> Dec 8 14:12:01 N1 kernel: da23: <SEAGATE ST3600057SS 000B> Fixed
> Direct
> Access SCSI-5 device
> Dec 8 14:12:01 N1 kernel: da23: Serial Number 6SL8YLPC0000N5030DY7
> Dec 8 14:12:01 N1 kernel: da23: 600.000MB/s transfers
> Dec 8 14:12:01 N1 kernel: da23: Command Queueing enabled
> Dec 8 14:12:01 N1 kernel: da23: *Attempt to query device size failed:
> NOT
> READY, Logical unit not ready, cause n*
> Dec 8 14:12:01 N1 kernel: ses1: da23,pass26: Element descriptor: 'Slot
> 24'
> Dec 8 14:12:01 N1 kernel: ses1: da23,pass26: SAS Device Slot Element:
> 1
> Phys at Slot 23
>
> *driver versions:-*
>
> dev.mps.0.firmware_version: 15.00.00.00
> dev.mps.0.driver_version: 16.00.00.00-fbsd
>
>
>
>
>
>
> On Tue, Dec 8, 2015 at 3:15 AM, Scott Long <scott4long at yahoo.com>
> wrote:
>
>> Hi,
>>
>> If your situation is accurate and the disk is not responding properly
>> to
>> regular
>> commands then it’s unlikely that it will respond to SMART commands
>> either.
>> Sometimes these situations are caused by a bad cable, bad controller,
>> or
>> buggy software/firmware, and only rarely will the standard statistics
>> in
>> SMART
>> pick up these kinds of errors. SMART is better at tracking wear rates
>> and
>> error rates on the physical media, both HDD and SSD, but even then
>> it’s
>> hard
>> for it to be accurately predictive or even accurately diagnostic. For
>> your case,
>> I recommend that you describe your hardware and software configuration
>> in
>> more detail, and look for physical abnormalities in the cabling and
>> connections.
>> Once that is ruled and and the rest of us know what kind of hardware
>> you’re
>> dealing with, we might be able to make better commendations.
>>
>> Scott
>>
>> > On Dec 7, 2015, at 11:07 AM, prateek sethi <prateekrootkey at gmail.com>
>> wrote:
>> >
>> > Hi ,
>> >
>> > Is there any way or tool to find out that a disk which is not responding
>> > properly is really bad or not? Sometimes I have seen that there is lot of
>> > CDB error for a drive and system reboot makes every thing fine. What can
>> be
>> > reasons for such kind of scenarios?
>> >
>> > I know smartctl is the one which can help. I have some couple of question
>> > regarding this .
>> >
>> > 1. What if disk does not support smartctl?
>> > 2. How I can do smartest use of smartctl command like which parameters
>> can
>> > tell that the disk is actually bad?
>> > 3. What other test I can perform to make it sure that disk has completely
>> > gone?
>> >
>> >
>> > Please tell me correct place to ask this question if I am asking at wrong
>> > place.
>> > _______________________________________________
>> > freebsd-scsi at freebsd.org mailing list
>> > https://lists.freebsd.org/mailman/listinfo/freebsd-scsi
>> > To unsubscribe, send any mail to "freebsd-scsi-unsubscribe at freebsd.org"
>>
>>
> _______________________________________________
> freebsd-scsi at freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-scsi
> To unsubscribe, send any mail to "freebsd-scsi-unsubscribe at freebsd.org"
Have you simply moved the drive to another slot - does the problem
follow the drive?
Unlikely but it could be a backplane issue.
I don't know about version 15 firmware, I have always used version 16
firmware
with 9.x to match the driver version.
More information about the freebsd-scsi
mailing list