QLogic 2360 FC HBAs not playing well with others

Tue Apr 13 16:49:20 UTC 2010

Gary Palmer wrote:
> On Sun, Apr 11, 2010 at 08:09:50PM -0600, Brad Waite wrote:
>>>> Matthew Jacob wrote:
>>>>> On 04/09/2010 11:29 AM, Brad Waite wrote:
>>>>> I beseech you, oh great masters of SCSI and fibre channel, hear my
>>> pleas
>>>>> for help!
>>>>>
>>>>> My 2 QLE2360s don't appear to be waking up properly in a Dell R710
>>>>> running 7.2 AMD64.  At the very least, they're not recognizing any of
>>>>> the volumes on the Sun 2540 array in the fabric.  Everything works just
>>>>> fine under VMware ESXi 4.1, though.
>>>>>    
>>>> Get newer firmware either by upgrading with RELENG_7 or snagging
>>>> asm_2300.h from RELENG_7 and rebuilding.
>>>>
>>>> You don't have to load all of ispfw
>>>>
>>>> isp2300_LOAD=YES
>>>>
>>>> should get you just that onemodule
>>>>
>>>> the latest in the FreeBSD tree is 3.03.26
>>> Woot.  That helped.  Built & installed RELENG_7, but I've got some
>>> more wierdness.
>> Woot.  That helped.
>>
>> Built & installed RELENG_7, but I've got some more wierdness.
>>
>> First off I've got da0 - da15 showing similar to this:
>>
>> da0 at isp0 bus 0 target 0 lun 0
>> da0: <SUN LCSM100_F 0670> Fixed Direct Access SCSI-5 device
>> da0: 200.000MB/s transfers
>> da0: Command Queueing Enabled
>> da0: 138989MB (284650656 512 byte sectors: 255H 63S/T 17718C)
>>
>> We've got a Sun Storagetek 2540 12-drive array with 4 volumes mapped to
>> this host.  It would appear that it's showing the 4 volumes AND each of
>> the 12 drives.  Is that normal?
>>
>> Next, I have about 20 of the following errors for each of da1, da2, da3,
>> da4, da9, da10, da11 & da12.
>>
>> (da1:isp0:0:0:1): READ(6)/WRITE(6) not supported, increasing
>> minimum_cmd_size to 10.
>> (da1:isp0:0:0:1): READ(10). CDB: 28 0 0 0 0 0 0 0 1 0
>> (da1:isp0:0:0:1): CAM Status: SCSI Status Error
>> (da1:isp0:0:0:1): SCSI Status: Check Condition
>> (da1:isp0:0:0:1): ILLEGAL REQUEST asc:94,1
>> (da1:isp0:0:0:1): Vendor Specific ASC
>> (da1:isp0:0:0:1): Unretryable error
>>
>> What's going on here?
>>
>> Is there any config I need to to for volume mapping and/or
>> multipathing?  I'm a complete newb when it comes to FC on FreeBSD, so
>> forgive my ignorance.
>>
>> Thanks for the help, guys!
> 
> I suspect the reason you have 16 disk devices showing up is that you 
> are running multipath.  You will get one da device showing up for each
> different path, and if you're running a full multipath environment
> that's likely 4 paths per device, which would lead to the 16 disks
> (unless they're not the sizes you expect, but I would tend to suspect
>  its a multipath artifact)

Thanks for pointing out what should have been obvious.  The Sun 2540 has 2 ports on 2 controllers and camcontrol shows
exactly that:

# camcontrol devlist
<SUN LCSM100_F 0670>               at scbus0 target 0 lun 0 (da0,pass0)
<SUN LCSM100_F 0670>               at scbus0 target 0 lun 1 (da1,pass1)
<SUN LCSM100_F 0670>               at scbus0 target 0 lun 2 (da2,pass2)
<SUN LCSM100_F 0670>               at scbus0 target 0 lun 3 (da3,pass3)
<SUN LCSM100_F 0670>               at scbus0 target 1 lun 0 (da4,pass4)
<SUN LCSM100_F 0670>               at scbus0 target 1 lun 1 (da5,pass5)
<SUN LCSM100_F 0670>               at scbus0 target 1 lun 2 (da6,pass6)
<SUN LCSM100_F 0670>               at scbus0 target 1 lun 3 (da7,pass7)
<SUN LCSM100_F 0670>               at scbus1 target 0 lun 0 (da8,pass8)
<SUN LCSM100_F 0670>               at scbus1 target 0 lun 1 (da9,pass9)
<SUN LCSM100_F 0670>               at scbus1 target 0 lun 2 (da10,pass10)
<SUN LCSM100_F 0670>               at scbus1 target 0 lun 3 (da11,pass11)
<SUN LCSM100_F 0670>               at scbus1 target 1 lun 0 (da12,pass12)
<SUN LCSM100_F 0670>               at scbus1 target 1 lun 1 (da13,pass13)
<SUN LCSM100_F 0670>               at scbus1 target 1 lun 2 (da14,pass14)
<SUN LCSM100_F 0670>               at scbus1 target 1 lun 3 (da15,pass15)

> To handle multipath you probably want to look at gmultipath(8).
> 
> I'm not sure about READ/WRITE errors.  You say they show up for 8
> devices?  Is it possible that the array is not true active/active
> on the controllers?  Its possible that half the paths are going to 
> a controller that is rejecting the I/O until the LUN fails over,
> but thats just a guess based on the error message.  If you can
> look at the controller/bus/target/lun information from dmesg and
> see if you can spot a pattern about the path to the LUNs giving
> the error that may give a better idea about whats going on.

I think you've nailed it.  da4-7 & da12-15 have the following respective lines in dmesg:

da[4-7]: 200.000MB/s transfers WWNN 0x200400a0b8388efd WWPN 0x203500a0b8388efd PortID 0x10100
da[12-15]: 200.000MB/s transfers WWNN 0x200400a0b8388efd WWPN 0x202500a0b8388efd PortID 0x10100

The two WWPNs correspond to the 2540's controllers and the write errors are on da0-3 & da8-11.  I can't find anything
yet in the docs on making the other ports active, but

I successfully labeled da7 & da15 with gmultipath, although I couldn't add da3 & da11 due to write errors.  No real
surprise, but since I can't add the label, what happens if one of the active ports on a controller fails?  I know I'd
have the other path to the active port on the other controller, but would I have to manually add the label to the
volumes from newly-active port?