QLogic 2360 FC HBAs not playing well with others

Tue Apr 13 16:53:51 UTC 2010

On 04/13/2010 09:49 AM, Brad Waite wrote:
> Gary Palmer wrote:
>    
>> On Sun, Apr 11, 2010 at 08:09:50PM -0600, Brad Waite wrote:
>>      
>>>>> Matthew Jacob wrote:
>>>>>            
>>>>>> On 04/09/2010 11:29 AM, Brad Waite wrote:
>>>>>> I beseech you, oh great masters of SCSI and fibre channel, hear my
>>>>>>              
>>>> pleas
>>>>          
>>>>>> for help!
>>>>>>
>>>>>> My 2 QLE2360s don't appear to be waking up properly in a Dell R710
>>>>>> running 7.2 AMD64.  At the very least, they're not recognizing any of
>>>>>> the volumes on the Sun 2540 array in the fabric.  Everything works just
>>>>>> fine under VMware ESXi 4.1, though.
>>>>>>
>>>>>>              
>>>>> Get newer firmware either by upgrading with RELENG_7 or snagging
>>>>> asm_2300.h from RELENG_7 and rebuilding.
>>>>>
>>>>> You don't have to load all of ispfw
>>>>>
>>>>> isp2300_LOAD=YES
>>>>>
>>>>> should get you just that onemodule
>>>>>
>>>>> the latest in the FreeBSD tree is 3.03.26
>>>>>            
>>>> Woot.  That helped.  Built&  installed RELENG_7, but I've got some
>>>> more wierdness.
>>>>          
>>> Woot.  That helped.
>>>
>>> Built&  installed RELENG_7, but I've got some more wierdness.
>>>
>>> First off I've got da0 - da15 showing similar to this:
>>>
>>> da0 at isp0 bus 0 target 0 lun 0
>>> da0:<SUN LCSM100_F 0670>  Fixed Direct Access SCSI-5 device
>>> da0: 200.000MB/s transfers
>>> da0: Command Queueing Enabled
>>> da0: 138989MB (284650656 512 byte sectors: 255H 63S/T 17718C)
>>>
>>> We've got a Sun Storagetek 2540 12-drive array with 4 volumes mapped to
>>> this host.  It would appear that it's showing the 4 volumes AND each of
>>> the 12 drives.  Is that normal?
>>>
>>> Next, I have about 20 of the following errors for each of da1, da2, da3,
>>> da4, da9, da10, da11&  da12.
>>>
>>> (da1:isp0:0:0:1): READ(6)/WRITE(6) not supported, increasing
>>> minimum_cmd_size to 10.
>>> (da1:isp0:0:0:1): READ(10). CDB: 28 0 0 0 0 0 0 0 1 0
>>> (da1:isp0:0:0:1): CAM Status: SCSI Status Error
>>> (da1:isp0:0:0:1): SCSI Status: Check Condition
>>> (da1:isp0:0:0:1): ILLEGAL REQUEST asc:94,1
>>> (da1:isp0:0:0:1): Vendor Specific ASC
>>> (da1:isp0:0:0:1): Unretryable error
>>>
>>> What's going on here?
>>>
>>> Is there any config I need to to for volume mapping and/or
>>> multipathing?  I'm a complete newb when it comes to FC on FreeBSD, so
>>> forgive my ignorance.
>>>
>>> Thanks for the help, guys!
>>>        
>> I suspect the reason you have 16 disk devices showing up is that you
>> are running multipath.  You will get one da device showing up for each
>> different path, and if you're running a full multipath environment
>> that's likely 4 paths per device, which would lead to the 16 disks
>> (unless they're not the sizes you expect, but I would tend to suspect
>>   its a multipath artifact)
>>      
> Thanks for pointing out what should have been obvious.  The Sun 2540 has 2 ports on 2 controllers and camcontrol shows
> exactly that:
>
> # camcontrol devlist
> <SUN LCSM100_F 0670>                at scbus0 target 0 lun 0 (da0,pass0)
> <SUN LCSM100_F 0670>                at scbus0 target 0 lun 1 (da1,pass1)
> <SUN LCSM100_F 0670>                at scbus0 target 0 lun 2 (da2,pass2)
> <SUN LCSM100_F 0670>                at scbus0 target 0 lun 3 (da3,pass3)
> <SUN LCSM100_F 0670>                at scbus0 target 1 lun 0 (da4,pass4)
> <SUN LCSM100_F 0670>                at scbus0 target 1 lun 1 (da5,pass5)
> <SUN LCSM100_F 0670>                at scbus0 target 1 lun 2 (da6,pass6)
> <SUN LCSM100_F 0670>                at scbus0 target 1 lun 3 (da7,pass7)
> <SUN LCSM100_F 0670>                at scbus1 target 0 lun 0 (da8,pass8)
> <SUN LCSM100_F 0670>                at scbus1 target 0 lun 1 (da9,pass9)
> <SUN LCSM100_F 0670>                at scbus1 target 0 lun 2 (da10,pass10)
> <SUN LCSM100_F 0670>                at scbus1 target 0 lun 3 (da11,pass11)
> <SUN LCSM100_F 0670>                at scbus1 target 1 lun 0 (da12,pass12)
> <SUN LCSM100_F 0670>                at scbus1 target 1 lun 1 (da13,pass13)
> <SUN LCSM100_F 0670>                at scbus1 target 1 lun 2 (da14,pass14)
> <SUN LCSM100_F 0670>                at scbus1 target 1 lun 3 (da15,pass15)
>
>    
>> To handle multipath you probably want to look at gmultipath(8).
>>
>> I'm not sure about READ/WRITE errors.  You say they show up for 8
>> devices?  Is it possible that the array is not true active/active
>> on the controllers?  Its possible that half the paths are going to
>> a controller that is rejecting the I/O until the LUN fails over,
>> but thats just a guess based on the error message.  If you can
>> look at the controller/bus/target/lun information from dmesg and
>> see if you can spot a pattern about the path to the LUNs giving
>> the error that may give a better idea about whats going on.
>>      
> I think you've nailed it.  da4-7&  da12-15 have the following respective lines in dmesg:
>
> da[4-7]: 200.000MB/s transfers WWNN 0x200400a0b8388efd WWPN 0x203500a0b8388efd PortID 0x10100
> da[12-15]: 200.000MB/s transfers WWNN 0x200400a0b8388efd WWPN 0x202500a0b8388efd PortID 0x10100
>
> The two WWPNs correspond to the 2540's controllers and the write errors are on da0-3&  da8-11.  I can't find anything
> yet in the docs on making the other ports active, but
>
> I successfully labeled da7&  da15 with gmultipath, although I couldn't add da3&  da11 due to write errors.  No real
> surprise, but since I can't add the label, what happens if one of the active ports on a controller fails?  I know I'd
> have the other path to the active port on the other controller, but would I have to manually add the label to the
> volumes from newly-active port?
>    

There's only one spindle. The label is on it. The other paths to it will 
see it (eventually).

There are recent changes to multipath that have been checked in that 
don't try and write down all provider paths.

There are other changes that need to go in as well to handle storage 
which is not, in fact, truly active-active.
You have some kind of Santricity clone. The 94,1 error is saying "You 
don't own this path".