problems with SAS JBODs 2
Ken Merry
ken at freebsd.org
Wed Jul 11 20:35:30 UTC 2018
Yes, I agree, Oliver’s problem looks different.
Oliver, for your second set of files (freebsd_sas2.zip) it looks like you may have devices that aren’t completely going away, even from a SAS standpoint.
Here are the 25 target IDs that show up in 2_shelf_connected_dmesg.txt:
mpr0: mprsas_add_device: Target ID for added device is 467.
mpr0: mprsas_add_device: Target ID for added device is 468.
mpr0: mprsas_add_device: Target ID for added device is 469.
mpr0: mprsas_add_device: Target ID for added device is 470.
mpr0: mprsas_add_device: Target ID for added device is 471.
mpr0: mprsas_add_device: Target ID for added device is 472.
mpr0: mprsas_add_device: Target ID for added device is 473.
mpr0: mprsas_add_device: Target ID for added device is 474.
mpr0: mprsas_add_device: Target ID for added device is 475.
mpr0: mprsas_add_device: Target ID for added device is 476.
mpr0: mprsas_add_device: Target ID for added device is 477.
mpr0: mprsas_add_device: Target ID for added device is 478.
mpr0: mprsas_add_device: Target ID for added device is 479.
mpr0: mprsas_add_device: Target ID for added device is 480.
mpr0: mprsas_add_device: Target ID for added device is 481.
mpr0: mprsas_add_device: Target ID for added device is 482.
mpr0: mprsas_add_device: Target ID for added device is 483.
mpr0: mprsas_add_device: Target ID for added device is 484.
mpr0: mprsas_add_device: Target ID for added device is 485.
mpr0: mprsas_add_device: Target ID for added device is 486.
mpr0: mprsas_add_device: Target ID for added device is 487.
mpr0: mprsas_add_device: Target ID for added device is 488.
mpr0: mprsas_add_device: Target ID for added device is 489.
mpr0: mprsas_add_device: Target ID for added device is 490.
mpr0: mprsas_add_device: Target ID for added device is 503.
Here are the 8 target IDs that disappear in 3_shelf_disconnected_dmesg.txt:
mpr0: mprsas_prepare_remove: Sending reset for target ID 467
mpr0: mprsas_prepare_remove: Sending reset for target ID 468
mpr0: mprsas_prepare_remove: Sending reset for target ID 469
mpr0: mprsas_prepare_remove: Sending reset for target ID 470
mpr0: mprsas_prepare_remove: Sending reset for target ID 471
mpr0: mprsas_prepare_remove: Sending reset for target ID 472
mpr0: mprsas_prepare_remove: Sending reset for target ID 473
mpr0: mprsas_prepare_remove: Sending reset for target ID 474
And here are the same 8 target IDs getting added in 4_shelf_reconnected_dmesg.txt:
mpr0: mprsas_add_device: Target ID for added device is 467.
mpr0: mprsas_add_device: Target ID for added device is 468.
mpr0: mprsas_add_device: Target ID for added device is 469.
mpr0: mprsas_add_device: Target ID for added device is 470.
mpr0: mprsas_add_device: Target ID for added device is 471.
mpr0: mprsas_add_device: Target ID for added device is 472.
mpr0: mprsas_add_device: Target ID for added device is 473.
mpr0: mprsas_add_device: Target ID for added device is 474.
Oliver, what happens when you try to do I/O to the devices that don’t go away after you pull the cable? Does that cause the devices to go away?
Looking at the mprutil output, it also shows the devices sticking around from the adapter’s standpoint.
You can also try a ‘camcontrol rescan all’ or a ‘camcontrol rescan N’ (where N is the scbus number shown by ‘camcontrol devlist -v’). That will do some basic probes for each of the devices and should in theory cause them to go away if they aren’t accessible.
It seems like the adapter may not be recognizing that the devices in question have gone.
Steve, do you have any ideas what could be going on?
Ken
—
Ken Merry
ken at FreeBSD.ORG
> On Jul 10, 2018, at 11:48 AM, Stephen Mcconnell via freebsd-scsi <freebsd-scsi at freebsd.org> wrote:
>
> Ken, I looked at the logs and I don't see anything in them that suggests
> that the driver is not adding any of the devices. In fact, I don't see
> anything that looks strange at all. This looks like a different problem than
> the other one you mentioned. What do you think?
>
> Steve
>
>> -----Original Message-----
>> From: Stephen Mcconnell [mailto:stephen.mcconnell at broadcom.com]
>> Sent: Tuesday, July 10, 2018 9:28 AM
>> To: 'Oliver Sech'; 'FreeBSD-scsi'
>> Subject: RE: problems with SAS JBODs 2
>>
>> Hi Oliver, I can't get to your links. Can you try to send the logs in
>> another
>> way?
>>
>> Steve
>>
>>> -----Original Message-----
>>> From: owner-freebsd-scsi at freebsd.org [mailto:owner-freebsd-
>>> scsi at freebsd.org] On Behalf Of Oliver Sech
>>> Sent: Tuesday, July 10, 2018 9:14 AM
>>> To: FreeBSD-scsi
>>> Subject: Re: problems with SAS JBODs 2
>>>
>>> I tested a few additional things. I don't think this is a multipath,
>>> daisy
>> chain
>>> nor a SAS wide ports problem.
>>> I can reproduce the problem with just a single connection to an
>>> Expander/JBOD.
>>>
>>> Test:
>>> * physically disconnect all shelves
>>> * reboot system
>>> * connect one shelf via SAS cable
>>> * check number of disks (after a reboot everything always shows up)
>>> * disconnect the shelf and wait (geom disk list still shows most disks.)
>>> * connect the shelf (missing disks)
>>>
>>> Tested Hardware:
>>> * Supermicro SAS3 847E2C-R1K28JBOD + SAS3 LSI 9305-16e ( internal
>> daisy
>>> chain + wide links)
>>> * Supermicro SAS3 847E2C-R1K28JBOD + SAS3 LSI 9305-16e (straight HBA
>> <-
>>>> EXPANDER connection. (no wide links, no daisy chain))
>>> * Supermicro SAS2 SC847E26-RJBOD1 + SAS3 LSI 9305-16e (internal
>>> daisy
>>> chain)
>>> * Promise SAS2 VTrak 830 + SAS3 LSI 9305-16e (straight HBA
>>> <->
>>> EXPANDER connection.)
>>>
>>>
>>>
>>> On 07/04/2018 12:15 PM, Oliver Sech wrote:
>>>>> 1) Are the expanders daisy chained? Some SAS expanders don't work
>>> reliably
>>>>> when daisy chained. Best to direct connect each one to the server.
>>>> At the moment I have 1 JBOD connected to 1 HBA Port with 1 cable (4
>>> lanes?).
>>>> Unfortunately the JBOD has 24 slots in the front and 20 in the back
>>>> and,
>>> those are connected via a internal SAS daisy chaining.
>>>> I could rewire and connect each backplane directly to the server, but
>>> unfortunately I do not have enough ports..
>>>>
>>>> JOBD Model: Supermicro 847E2C-R1K28JBOD
>>>>
>>>>> 2) Are the expanders connected in multipath or single path? You need
>>>>> geom_multipath if you're going to do that.
>>>> See answer 1. There is a single path from the host to the first
>>>> expander.
>>>>
>>>>> 3) Are you attempting to use wide ports (two SAS cables connecting
>> each
>>>>> expander to the HBA). If do, you'll need to make sure that each pair
>>>>> of
>>>>> SAS cables goes to the same HBA chip (not merely the same card, as
>> some
>>>>> cards contain two HBA chips).
>>>> see 1. The last time I opened one of those JBODs there were 8 SAS
>>>> cables
>>> between the Front and Back expander. I assume that wide ports are being
>>> used.
>>>> (2 expanders per backplane as well)
>>>>
>>>>> 4) Are you trying to remove an expander while ZFS is active on that
>>>>> expander? That will suspend your pool, and ZFS doesn't always
>>>>> recover
>>> from
>>>>> a suspended state.
>>>> I'm testing with a new unused disk shelf that was never part of the
>>>> ZFS
>>> pool. There were
>>>> _______________________________________________
>>>> freebsd-scsi at freebsd.org mailing list
>>>> https://lists.freebsd.org/mailman/listinfo/freebsd-scsi
>>>> To unsubscribe, send any mail to
>>>> "freebsd-scsi-unsubscribe at freebsd.org"
>>> _______________________________________________
>>> freebsd-scsi at freebsd.org mailing list
>>> https://lists.freebsd.org/mailman/listinfo/freebsd-scsi
>>> To unsubscribe, send any mail to "freebsd-scsi-unsubscribe at freebsd.org"
> _______________________________________________
> freebsd-scsi at freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-scsi
> To unsubscribe, send any mail to "freebsd-scsi-unsubscribe at freebsd.org"
More information about the freebsd-scsi
mailing list