Informal(?) sesX messages

Sat Dec 12 06:48:52 UTC 2015

On 2015/12/11 22:55, Alan Somers wrote:
> On Fri, Dec 11, 2015 at 8:02 PM,  <Mykel at mware.ca> wrote:
>> On 15-12-11 17:44, Alan Somers wrote:
>>> On Fri, Dec 11, 2015 at 3:34 PM,  <Mykel at mware.ca> wrote:
>>>> Hi all, please CC me on reply as I'm not subscribed to this list.
>>>>
>>>> I've got one of those Supermicro 72-drive monster machines, all ZFS'd up.
>>>> https://www.supermicro.com/products/system/4u/6048/SSG-6048R-E1CR72L.cfm
>>>>
>>>> And before & after replacing a faulty SAS Expander and a pair of cables
>>>> (gobs of WRITE/ABORT errors), I'm still occasionally seeing these kernel
>>>> messages (in groups), and I'm not sure if they're benign, or pointing to
>>>> a
>>>> SAS expander event... or what. I admit, this is my first time dealing
>>>> with a
>>>> machine with SAS expanders, so I'm a bit out of my depth in diagnosis
>>>> thereof.
>>>>
>>>> Dec 11 16:06:54 ZFS-AF kernel: ses5: da7,pass7: Element descriptor:
>>>> 'Slot00'
>>>> Dec 11 16:06:54 ZFS-AF kernel: ses5: da7,pass7: SAS Device Slot Element:
>>>> 1
>>>> Phys at Slot 0
>>>> Dec 11 16:06:54 ZFS-AF kernel: ses5:  phy 0: SAS device type 1 id 0
>>>> Dec 11 16:06:54 ZFS-AF kernel: ses5:  phy 0: protocols: Initiator( None )
>>>> Target( SSP )
>>>> Dec 11 16:06:54 ZFS-AF kernel: ses5:  phy 0: parent 500304801ea2df3f addr
>>>> 5000c500844bd449
>>>>
>>> These look like device arrival notifications.  If you scroll up, do
>>> you see any departure notifications?  They should look like this:
>>>
>>> mps0: mpssas_prepare_remove: Sending reset for target ID 10
>>> da0 at mps0 bus 0 scbus0 target 10 lun 0
>>> da0: <ATA Hitachi HUA72201 A39C> s/n       JPW930HQ15H26H detached
>>> mps0: Unfreezing devq for target ID 10
>>> xpt_release_devq(): requested 1 > present 0
>>> (da0:mps0:0:10:0): Periph destroyed
>>>
>>> Also, could you post your HBA and expander firmware versions?
>>>
>>> -Alan
>>
>> I can say, without doubt, that I do NOT have any preceding detachments...
>> which is why I'm so baffled by the messages. If the devices aren't
>> de/reattaching, what's the point of these informal/benign ones? I am
>> familiar with them from other hot-swap and disk failure scenarios in other
>> machines.
>>
>> Could this be a driver bug not logging the disconnection? But when I
>> hot-unplugged them, I do see that in dmesg.
>> Or does SAS do something where it might renegotiate or reconfigure the
>> lanes, and I'm just seeing it do that?
>>
>> Thanks,
>>
>> Myke
>>
>>
>> dev.mpr.0.driver_version: 09.255.01.00-fbsd
>> dev.mpr.0.firmware_version: 06.00.00.00
>> dev.mpr.1.driver_version: 09.255.01.00-fbsd
>> dev.mpr.1.firmware_version: 08.00.00.00
>> dev.mpr.2.driver_version: 09.255.01.00-fbsd
>> dev.mpr.2.firmware_version: 08.00.00.00
>>
>> [root at ZFS-AF ~]# sg_inq --hex --len=64 ses0
>>   00     0d 00 05 02 33 00 40 02  4c 53 49 20 20 20 20 20 ....3. at .LSI
>>   10     53 41 53 33 78 34 38 20  20 20 20 20 20 20 20 20 SAS3x48
>>   20     30 37 30 31 78 34 38 2d  36 36 2e 37 2e 31 2e 31 0701x48-66.7.1.1
>>   30     37 00 20 20 20 20 20 20 7.
>> [root at ZFS-AF ~]# sg_inq --hex --len=64 ses1
>>   00     0d 00 05 02 33 00 40 02  4c 53 49 20 20 20 20 20 ....3. at .LSI
>>   10     53 41 53 33 78 33 36 20  20 20 20 20 20 20 20 20 SAS3x36
>>   20     30 37 30 31 78 33 36 2d  36 36 2e 37 2e 31 2e 31 0701x36-66.7.1.1
>>   30     37 00 20 20 20 20 20 20 7.
>> [root at ZFS-AF ~]# sg_inq --hex --len=64 ses2
>> SCSI INQUIRY failed on ses2, res=-1
>> [root at ZFS-AF ~]# sg_inq --hex --len=64 ses3
>> SCSI INQUIRY failed on ses3, res=-1
>> [root at ZFS-AF ~]# sg_inq --hex --len=64 ses4
>>   00     0d 00 05 02 33 00 40 02  4c 53 49 20 20 20 20 20 ....3. at .LSI
>>   10     53 41 53 33 78 32 38 20  20 20 20 20 20 20 20 20 SAS3x28
>>   20     30 37 30 31 78 32 38 2d  36 36 2e 37 2e 31 2e 31 0701x28-66.7.1.1
>>   30     37 00 20 20 20 20 20 20 7.
>> [root at ZFS-AF ~]# sg_inq --hex --len=64 ses5
>>   00     0d 00 05 02 33 00 40 02  4c 53 49 20 20 20 20 20 ....3. at .LSI
>>   10     53 41 53 33 78 34 38 20  20 20 20 20 20 20 20 20 SAS3x48
>>   20     30 37 30 31 78 34 38 2d  36 36 2e 37 2e 31 2e 31 0701x48-66.7.1.1
>>   30     37 00 20 20 20 20 20 20 7.
>> [root at ZFS-AF ~]#
>>
>>
>> And here's dmesg after fresh reboot:
> Well, that's weird.  Your firmware versions look OK, though you might
> want to upgrade mpr0 just to be consistent.  The next thing I would
> check, if I were you, would be devctl messages.  Edit /etc/syslog.conf
> and change devd's loglevel to INFO, then HUP syslogd.  Now every
> devctl message should get logged in /var/log/devd.log.  That will tell
> you more precisely than dmesg whether there are any arrival or
> departure events.
>
> -Alan
Huh, I never noticed the 6 vs. 8; curiously, mpr0 and mpr1 are the two 
connected to the front expander... and where I've never seen an issue. 
Tho perhaps I scrambled which cards are serving was which in my testing 
- I also moved mpr2 to sit on the other CPU's PCI bus.

I've added the devd log, although I haven't been able to trigger the 
event yet anyway.
Tried to assert hw.mpr.2.debug_level, however it seems like hw.mpr 
doesn't exist.

Finally, I haven't the slightest clue how to update the firmware; the 
Avago site only has a product brochure for the 3008 anyway :(