Re: CAM status: SCSI Status Error

From: David Christensen <dpchrist_at_holgerdanske.com>
Date: Fri, 22 Nov 2024 18:14:28 UTC
On 11/22/24 05:11, Dan Langille wrote:
> On FreeBSD 14.1, is this a server issue (e.g. cable/hardware) as opposed to a drive issue?
> 
> Nov 21 05:28:48 r730-03 kernel: (da7:mrsas0:1:7:0): READ(10). CDB: 28 00 aa d9 5b 5f 00 00 20 00
> Nov 21 05:28:48 r730-03 kernel: (da7:mrsas0:1:7:0): CAM status: SCSI Status Error
> Nov 21 05:28:48 r730-03 kernel: (da7:mrsas0:1:7:0): SCSI status: OK
> Nov 21 05:28:54 r730-03 kernel: (da7:mrsas0:1:7:0): READ(10). CDB: 28 00 aa d9 6b 08 00 00 10 00
> Nov 21 05:28:54 r730-03 kernel: (da7:mrsas0:1:7:0): CAM status: SCSI Status Error
> Nov 21 05:28:54 r730-03 kernel: (da7:mrsas0:1:7:0): SCSI status: OK
> Nov 21 05:55:34 r730-03 smartd[17215]: Device: /dev/da7 [SAT], ATA error count increased from 4 to 8


I believe those errors are related to the connection between the drive 
and the host -- e.g. cables, connectors, and/or interface chips.  I 
would replace the cable with a known good cable.


A failing power supply can cause all sorts of problems.  I would check 
the PSU with a hardware tester.


> Followed by this from time to time:
> 
> Nov 21 16:55:33 r730-03 smartd[17215]: Device: /dev/da7 [SAT], Self-Test Log error count increased from 0 to 1
> Nov 22 11:25:35 r730-03 smartd[17215]: Device: /dev/da7 [SAT], 1 Currently unreadable (pending) sectors


STFW I found a good explanation for pending sectors:

https://superuser.com/questions/384095/how-to-force-a-remap-of-sectors-reported-in-s-m-a-r-t-c5-current-pending-sector


If you can identify the address (LBA) of the bad sector, you could use 
dd(1) to overwrite the bad sector.  If the drive is in an operating 
pool, this could be risky.  Shutting down and using live media would be 
safer.  In either case, you will want to scrub afterwards.


David