problems with SAS JBODs 2

slm at freebsd.org slm at freebsd.org
Wed Jul 11 20:56:32 UTC 2018


I'm think this is a mapping table problem or the use_phy_num problem. I'm
having Oliver change the use_phy_num sysctl values to 0 and then use your
script to clear out the controller mapping entries to see what happens.

Steve

> -----Original Message-----
> From: Ken Merry [mailto:ken at freebsd.org]
> Sent: Wednesday, July 11, 2018 2:35 PM
> To: Stephen Mcconnell; Oliver Sech
> Cc: FreeBSD-scsi
> Subject: Re: problems with SAS JBODs 2
>
> Yes, I agree, Oliver’s problem looks different.
>
> Oliver, for your second set of files (freebsd_sas2.zip) it looks like you
> may
> have devices that aren’t completely going away, even from a SAS
> standpoint.
>
> Here are the 25 target IDs that show up in 2_shelf_connected_dmesg.txt:
>
> mpr0: mprsas_add_device: Target ID for added device is 467.
> mpr0: mprsas_add_device: Target ID for added device is 468.
> mpr0: mprsas_add_device: Target ID for added device is 469.
> mpr0: mprsas_add_device: Target ID for added device is 470.
> mpr0: mprsas_add_device: Target ID for added device is 471.
> mpr0: mprsas_add_device: Target ID for added device is 472.
> mpr0: mprsas_add_device: Target ID for added device is 473.
> mpr0: mprsas_add_device: Target ID for added device is 474.
> mpr0: mprsas_add_device: Target ID for added device is 475.
> mpr0: mprsas_add_device: Target ID for added device is 476.
> mpr0: mprsas_add_device: Target ID for added device is 477.
> mpr0: mprsas_add_device: Target ID for added device is 478.
> mpr0: mprsas_add_device: Target ID for added device is 479.
> mpr0: mprsas_add_device: Target ID for added device is 480.
> mpr0: mprsas_add_device: Target ID for added device is 481.
> mpr0: mprsas_add_device: Target ID for added device is 482.
> mpr0: mprsas_add_device: Target ID for added device is 483.
> mpr0: mprsas_add_device: Target ID for added device is 484.
> mpr0: mprsas_add_device: Target ID for added device is 485.
> mpr0: mprsas_add_device: Target ID for added device is 486.
> mpr0: mprsas_add_device: Target ID for added device is 487.
> mpr0: mprsas_add_device: Target ID for added device is 488.
> mpr0: mprsas_add_device: Target ID for added device is 489.
> mpr0: mprsas_add_device: Target ID for added device is 490.
> mpr0: mprsas_add_device: Target ID for added device is 503.
>
> Here are the 8 target IDs that disappear in
> 3_shelf_disconnected_dmesg.txt:
>
> mpr0: mprsas_prepare_remove: Sending reset for target ID 467
> mpr0: mprsas_prepare_remove: Sending reset for target ID 468
> mpr0: mprsas_prepare_remove: Sending reset for target ID 469
> mpr0: mprsas_prepare_remove: Sending reset for target ID 470
> mpr0: mprsas_prepare_remove: Sending reset for target ID 471
> mpr0: mprsas_prepare_remove: Sending reset for target ID 472
> mpr0: mprsas_prepare_remove: Sending reset for target ID 473
> mpr0: mprsas_prepare_remove: Sending reset for target ID 474
>
> And here are the same 8 target IDs getting added in
> 4_shelf_reconnected_dmesg.txt:
>
> mpr0: mprsas_add_device: Target ID for added device is 467.
> mpr0: mprsas_add_device: Target ID for added device is 468.
> mpr0: mprsas_add_device: Target ID for added device is 469.
> mpr0: mprsas_add_device: Target ID for added device is 470.
> mpr0: mprsas_add_device: Target ID for added device is 471.
> mpr0: mprsas_add_device: Target ID for added device is 472.
> mpr0: mprsas_add_device: Target ID for added device is 473.
> mpr0: mprsas_add_device: Target ID for added device is 474.
>
> Oliver, what happens when you try to do I/O to the devices that don’t go
> away after you pull the cable?  Does that cause the devices to go away?
>
> Looking at the mprutil output, it also shows the devices sticking around
> from
> the adapter’s standpoint.
>
> You can also try a ‘camcontrol rescan all’ or a ‘camcontrol rescan N’
> (where N
> is the scbus number shown by ‘camcontrol devlist -v’).  That will do some
> basic probes for each of the devices and should in theory cause them to go
> away if they aren’t accessible.
>
> It seems like the adapter may not be recognizing that the devices in
> question
> have gone.
>
> Steve, do you have any ideas what could be going on?
>
> Ken
>> Ken Merry
> ken at FreeBSD.ORG
>
>
>
> > On Jul 10, 2018, at 11:48 AM, Stephen Mcconnell via freebsd-scsi
> > <freebsd-
> scsi at freebsd.org> wrote:
> >
> > Ken, I looked at the logs and I don't see anything in them that suggests
> > that the driver is not adding any of the devices. In fact, I don't see
> > anything that looks strange at all. This looks like a different problem
> > than
> > the other one you mentioned. What do you think?
> >
> > Steve
> >
> >> -----Original Message-----
> >> From: Stephen Mcconnell [mailto:stephen.mcconnell at broadcom.com]
> >> Sent: Tuesday, July 10, 2018 9:28 AM
> >> To: 'Oliver Sech'; 'FreeBSD-scsi'
> >> Subject: RE: problems with SAS JBODs 2
> >>
> >> Hi Oliver, I can't get to your links. Can you try to send the logs in
> >> another
> >> way?
> >>
> >> Steve
> >>
> >>> -----Original Message-----
> >>> From: owner-freebsd-scsi at freebsd.org [mailto:owner-freebsd-
> >>> scsi at freebsd.org] On Behalf Of Oliver Sech
> >>> Sent: Tuesday, July 10, 2018 9:14 AM
> >>> To: FreeBSD-scsi
> >>> Subject: Re: problems with SAS JBODs 2
> >>>
> >>> I tested a few additional things. I don't think this is a multipath,
> >>> daisy
> >> chain
> >>> nor a SAS wide ports problem.
> >>> I can reproduce the problem with just a single connection to an
> >>> Expander/JBOD.
> >>>
> >>> Test:
> >>> * physically disconnect all shelves
> >>> * reboot system
> >>> * connect one shelf via SAS cable
> >>> * check number of disks (after a reboot everything always shows up)
> >>> * disconnect the shelf and wait (geom disk list still shows most
> >>> disks.)
> >>> * connect the shelf (missing disks)
> >>>
> >>> Tested Hardware:
> >>> * Supermicro SAS3 847E2C-R1K28JBOD     + SAS3 LSI 9305-16e ( internal
> >> daisy
> >>> chain + wide links)
> >>> * Supermicro SAS3 847E2C-R1K28JBOD     + SAS3 LSI 9305-16e (straight
> HBA
> >> <-
> >>>> EXPANDER connection. (no wide links, no daisy chain))
> >>> * Supermicro SAS2 SC847E26-RJBOD1      + SAS3 LSI 9305-16e (internal
> >>> daisy
> >>> chain)
> >>> * Promise    SAS2 VTrak 830            + SAS3 LSI 9305-16e (straight
> >>> HBA
> >>> <->
> >>> EXPANDER connection.)
> >>>
> >>>
> >>>
> >>> On 07/04/2018 12:15 PM, Oliver Sech wrote:
> >>>>> 1) Are the expanders daisy chained?  Some SAS expanders don't work
> >>> reliably
> >>>>> when daisy chained.   Best to direct connect each one to the server.
> >>>> At the moment I have 1 JBOD connected to 1 HBA Port with 1 cable (4
> >>> lanes?).
> >>>> Unfortunately the JBOD has 24 slots in the front and 20 in the back
> >>>> and,
> >>> those are connected via a internal SAS daisy chaining.
> >>>> I could rewire and connect each backplane directly to the server, but
> >>> unfortunately I do not have enough ports..
> >>>>
> >>>> JOBD Model: Supermicro 847E2C-R1K28JBOD
> >>>>
> >>>>> 2) Are the expanders connected in multipath or single path?  You
> need
> >>>>> geom_multipath if you're going to do that.
> >>>> See answer 1. There is a single path from the host to the first
> >>>> expander.
> >>>>
> >>>>> 3) Are you attempting to use wide ports (two SAS cables connecting
> >> each
> >>>>> expander to the HBA).  If do, you'll need to make sure that each
> >>>>> pair
> >>>>> of
> >>>>> SAS cables goes to the same HBA chip (not merely the same card, as
> >> some
> >>>>> cards contain two HBA chips).
> >>>> see 1. The last time I opened one of those JBODs there were 8 SAS
> >>>> cables
> >>> between the Front and Back expander. I assume that wide ports are
> being
> >>> used.
> >>>> (2 expanders per backplane as well)
> >>>>
> >>>>> 4) Are you trying to remove an expander while ZFS is active on that
> >>>>> expander?  That will suspend your pool, and ZFS doesn't always
> >>>>> recover
> >>> from
> >>>>> a suspended state.
> >>>> I'm testing with a new unused disk shelf that was never part of the
> >>>> ZFS
> >>> pool. There were
> >>>> _______________________________________________
> >>>> freebsd-scsi at freebsd.org mailing list
> >>>> https://lists.freebsd.org/mailman/listinfo/freebsd-scsi
> >>>> To unsubscribe, send any mail to
> >>>> "freebsd-scsi-unsubscribe at freebsd.org"
> >>> _______________________________________________
> >>> freebsd-scsi at freebsd.org mailing list
> >>> https://lists.freebsd.org/mailman/listinfo/freebsd-scsi
> >>> To unsubscribe, send any mail to "freebsd-scsi-
> unsubscribe at freebsd.org"
> > _______________________________________________
> > freebsd-scsi at freebsd.org mailing list
> > https://lists.freebsd.org/mailman/listinfo/freebsd-scsi
> > To unsubscribe, send any mail to "freebsd-scsi-unsubscribe at freebsd.org"


More information about the freebsd-scsi mailing list