issue with hot swap and mps driver
Jason Keltz
jas at cse.yorku.ca
Wed Apr 1 18:32:38 UTC 2015
I have an LSI 9205-8e card in a system running FreeBSD 10.1-RELEASE-p5.
> mps0: <LSI SAS2308> port 0x4000-0x40ff mem
> 0xc1440000-0xc144ffff,0xc1400000-0xc143ffff irq 16 at device 0.0 on pci1
> mps0: Firmware: 20.00.02.00, Driver: 19.00.00.00-fbsd
> mps0: IOCCapabilities:
> 5285c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,HostDisc>
The card is connected to an AIC 24 disk JBOD (AIC SSG-JBSA21-4243-A1 SAS
6G) with hot swap capability.
I'm having some interesting hot swap issues under mps with 2 TB Western
Digital SATA disks.
1) If I hot swap an older Western Digital disk, model WD2002FAEX-007BA0
with firmware 1D05, the disk hot swaps perfectly under FreeBSD. That
is, when I remove the disk, the device entry in /dev is removed, and
when I re-insert the disk, it returns. This is the behaviour I expect.
> (da21:mps0:0:31:0): Periph destroyed
> da21 at mps0 bus 0 scbus0 target 31 lun 0
> da21: <ATA WDC WD2002FAEX-0 1D05> Fixed Direct Access SCSI-6 device
> da21: Serial Number WD-XXXXXXXXXXXXX
> da21: 600.000MB/s transfers
> da21: Command Queueing enabled
> da21: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C)
2) If I hot swap a slightly newer Western Digital disk, model
WD2002FAEX-00MJRA0 with firmware 1L01, then when I re-insert the disk,
the device entry does not return, and I instead see this:
> mpssas_get_sata_identify: error reading SATA PASSTHRU; iocstatus = 0x47
> mpssas_get_sata_identify: error reading SATA PASSTHRU; iocstatus = 0x47
> mpssas_get_sata_identify: error reading SATA PASSTHRU; iocstatus = 0x47
> mpssas_get_sata_identify: error reading SATA PASSTHRU; iocstatus = 0x47
> mpssas_get_sata_identify: error reading SATA PASSTHRU; iocstatus = 0x47
> _mapping_get_dev_info: failed to compute the hashed SAS Address for
> SATA device with handle 0x0019
> failure at /usr/src/sys/dev/mps/mps_sas_lsi.c:670/mpssas_add_device()!
> Could not get ID for device with handle 0x0019
> mpssas_fw_work: failed to add device with handle 0x19
3) If I hot swap a much newer RE disk, WD2000FYYZ-0 with 1K03 firmware,
the problem is the same as 2).
Worse, if I run an "sas2ircu 0 display" command to list the enclosure
after the above error occurs, the kernel dumps.
I needed to resolve this issue under both Red Hat Enterprise Linux, and
FreeBSD because I am using one set of these disks in each system. Now,
I've been in contact with AIC, Western Digital, LSI/Avago, and Red Hat
and spent countless hours sending debugging details, etc. Through RHEL,
I was able to get a patch indirectly through Avago which "solves" the
driver problem for RHEL. ("In this patch driver won't block the device
if the device state is "SDEV_CREATED" (i.e. driver won't block the drive
when drive is still in the device add process at SCSI MID Layer). So
that SCSI MID Layer can send the Inquiry commands.".
The patch is slated for internal review at LSI.
It would be nice to see a similar patch on the FreeBSD version of the
mps driver which prevents the driver from hanging when the disk is
inserted. There's no question that the disks take a little bit of extra
time to respond. It's not really clear why it can't have a wee bit
extra time to respond. However, in discussing the issue with other
people, I'm told that this problem occurs on other vendor hard disks as
well.
Jason.
ps: While I'm running with Firmware: 20.00.02.00, and Driver:
19.00.00.00-fbsd, I've tested with firmware 19 as well, and this doesn't
change anything.
More information about the freebsd-drivers
mailing list