Not all disks come back after power cycling a JBOD
Andriy Gapon
avg at FreeBSD.org
Thu Apr 2 08:48:12 UTC 2020
On 30/03/2020 19:05, Alan Somers wrote:
> If I remove a hot-swappable SCSI drive and reinsert it, FreeBSD always
> seems to handle that just fine. But if instead I unplug or power off an
> entire JBOD, then reattach it, frequently FreeBSD fails to fails to
> recreate all of the device nodes. Using "mpsutil show devices" or "mprutil
> show devices" I can see all of the devices that I'm expecting. However,
> "camcontrol devlist" doesn't show them, and "camcontrol rescan" doesn't
> help.
>
> This has been the situation for as long as I can remember, several years at
> least. But now it's starting to cause problems for me. Before I try to
> debug this myself, does anybody know anything about the problem?
I have been trying to help a user with this problem with mpr driver.
It seemed that the problem happened at the controller or expander level.
At least, I could not see any problem with the driver.
Some things we saw:
- the problem could be reproduced with Linux as well
- it was always the same slots / expander ports that could get the problem
We collected logs after doing these things:
- dev.mpr.0.debug_level=0x6ff
- camcontrol debug -I -P -c -p <bus>
>From what I could see in the logs affected disks were in permanent reset state
and that's what the controller kept reporting.
The driver kept getting SasTopologyChangeList events where the affected disks
kept oscillating between PHYLinkStatusChange and TargetMissing.
E.g., PHY 3 and 5 here:
EventDataLength: 6
AckRequired: 0
Event: SasTopologyChangeList (0x1c)
EventContext: 0x0
EnclosureHandle: 0x2
ExpanderDevHandle: 0x9
NumPhys: 39
NumEntries: 3
StartPhyNum: 3
ExpStatus: Responding (0x3)
PhysicalPort: 0
PHY[3].AttachedDevHandle: 0x000d
PHY[3].LinkRate: 12.0Gbps (0xb0)
PHY[3].PhyStatus: PHYLinkStatusChange
PHY[4].AttachedDevHandle: 0x000e
PHY[4].LinkRate: 12.0Gbps (0xbb)
PHY[4].PhyStatus: PHYLinkStatusUnchanged
PHY[5].AttachedDevHandle: 0x000f
PHY[5].LinkRate: 12.0Gbps (0xb0)
PHY[5].PhyStatus: PHYLinkStatusChange
EventDataLength: 6
AckRequired: 0
Event: SasTopologyChangeList (0x1c)
EventContext: 0x0
EnclosureHandle: 0x2
ExpanderDevHandle: 0x9
NumPhys: 39
NumEntries: 3
StartPhyNum: 3
ExpStatus: Responding (0x3)
PhysicalPort: 0
PHY[3].AttachedDevHandle: 0x000d
PHY[3].LinkRate: LinkRate Unknown (0xb)
PHY[3].PhyStatus: TargetMissing
PHY[4].AttachedDevHandle: 0x000e
PHY[4].LinkRate: 12.0Gbps (0xbb)
PHY[4].PhyStatus: PHYLinkStatusUnchanged
PHY[5].AttachedDevHandle: 0x000f
PHY[5].LinkRate: LinkRate Unknown (0xb)
PHY[5].PhyStatus: TargetMissing
There were also SasDeviceStatusChange like this:
mpr0: EventReply :
EventDataLength: 7
AckRequired: 0
Event: SasDeviceStatusChange (0xf)
EventContext: 0x20
TaskTag: 0xffff
ReasonCode: Internal Device Reset
ASC: 0x0
ASCQ: 0x0
DevHandle: 0x20
SASAddress: 0x5000cca2584a54cd
mpr0: EventReply :
EventDataLength: 7
AckRequired: 0
Event: SasDeviceStatusChange (0xf)
EventContext: 0x20
TaskTag: 0xffff
ReasonCode: Cmp Internal Device Reset
ASC: 0x0
ASCQ: 0x0
DevHandle: 0x20
SASAddress: 0x5000cca2584a54cd
Finally, the user discovered that after sas3flash -reset the controller (and
FreeBSD) is able to see all disks again.
If anyone has any thoughts / suggestions they are very welcome!
--
Andriy Gapon
More information about the freebsd-scsi
mailing list