problems with SAS JBODs 2
Oliver Sech
crimsonthunder at gmx.net
Thu Jul 12 10:00:46 UTC 2018
On 07/11/2018 10:35 PM, Ken Merry wrote:
> Oliver, what happens when you try to do I/O to the devices that don’t go away after you pull the cable? Does that cause the devices to go away?
I tried to 'dd if=/dev/daX of=/dev/null bs=1k count=1' and at least the "da" device disappears.
> Looking at the mprutil output, it also shows the devices sticking around from the adapter’s standpoint.
>
> You can also try a ‘camcontrol rescan all’ or a ‘camcontrol rescan N’ (where N is the scbus number shown by ‘camcontrol devlist -v’). That will do some basic probes for each of the devices and should in theory cause them to go away if they aren’t accessible.
>
> It seems like the adapter may not be recognizing that the devices in question have gone.
I'm pretty sure that I tried this 'camcontrol rescan all' a few times. While I not sure anymore if that cleans up the non-working devices, I'm sure that no new devices were added.
Unfortunately I haven't gotten yet to Steves 'clear controller mapping' script but I did a few other things:
* The last time I tried to upgrade the firmware I had all sorts of problems. "sas3flash" reported bad checksums while flashing some of the files.
So I reflashed both controllers with the DOS version of sas3flash. This was basically a challenge in itself because the DOS version of this utility does not seem to run on computers of this decade. (ERROR: Failed to initialize PAL. Exiting program.)
The equivalent sas3flash.EFI version seems to be out of date and caused the checksum problems described before.
(This time I wiped them before flashing with "sas3flash -o -e 6".)
* I tried to change mpr tuneable "use_phy_num" after that but this has not improved the situation. I will retry and collect logs with Steves script.
* I retried with the latest "mpr.ko" from the broadcom download page. (Same problems, no "use_phy_num" tuneable.)
* I retested this hardware with Linux (4.15 and 4.17)
** Some shelves could be replugged reliably (ie: 45 disks show up, 45 disks disappear, 45 disks reappear)
** The newest shelf 2 disks were missing after the replugging (ie: 44 disks show up, 44 disks disappear, 42 disks reappear) (kernel log mpt3sas_cm0: "device is not present handle)
* I tired a different controller
** So far I used a Broadcom LSI SAS 9305-16e (Controller: SAS3216) (Firmware 16.00.01.00 or 15.00.00.00)
** Yesterday I switched to a new fresh out-of-the-box Broadcom LSI 9305-24i (Controller: SAS3224) (Firmware 09.00.00.00 (or something similar with 09*))
With the new controller everything seems work on Linux. It might be the old Firmware?...
It is better with the new controller on FreeBSD in that sense that I at least get one out of two /dev/sesX devices back. But disks are still missing and are not getting completely cleaned up...
This whole thing is a bit frustrating, especially since up until now I thought that HBAs are kind of "connect and forget" devices. Next step is to set up a separate test environment and try to get it to work there. I will keep you updated and try provide log for all FreeBSD related problems.
More information about the freebsd-scsi
mailing list