QLogic 2360 FC HBAs not playing well with others
Brad Waite
freebsd at wcubed.net
Tue Apr 13 16:49:20 UTC 2010
Gary Palmer wrote:
> On Sun, Apr 11, 2010 at 08:09:50PM -0600, Brad Waite wrote:
>>>> Matthew Jacob wrote:
>>>>> On 04/09/2010 11:29 AM, Brad Waite wrote:
>>>>> I beseech you, oh great masters of SCSI and fibre channel, hear my
>>> pleas
>>>>> for help!
>>>>>
>>>>> My 2 QLE2360s don't appear to be waking up properly in a Dell R710
>>>>> running 7.2 AMD64. At the very least, they're not recognizing any of
>>>>> the volumes on the Sun 2540 array in the fabric. Everything works just
>>>>> fine under VMware ESXi 4.1, though.
>>>>>
>>>> Get newer firmware either by upgrading with RELENG_7 or snagging
>>>> asm_2300.h from RELENG_7 and rebuilding.
>>>>
>>>> You don't have to load all of ispfw
>>>>
>>>> isp2300_LOAD=YES
>>>>
>>>> should get you just that onemodule
>>>>
>>>> the latest in the FreeBSD tree is 3.03.26
>>> Woot. That helped. Built & installed RELENG_7, but I've got some
>>> more wierdness.
>> Woot. That helped.
>>
>> Built & installed RELENG_7, but I've got some more wierdness.
>>
>> First off I've got da0 - da15 showing similar to this:
>>
>> da0 at isp0 bus 0 target 0 lun 0
>> da0: <SUN LCSM100_F 0670> Fixed Direct Access SCSI-5 device
>> da0: 200.000MB/s transfers
>> da0: Command Queueing Enabled
>> da0: 138989MB (284650656 512 byte sectors: 255H 63S/T 17718C)
>>
>> We've got a Sun Storagetek 2540 12-drive array with 4 volumes mapped to
>> this host. It would appear that it's showing the 4 volumes AND each of
>> the 12 drives. Is that normal?
>>
>> Next, I have about 20 of the following errors for each of da1, da2, da3,
>> da4, da9, da10, da11 & da12.
>>
>> (da1:isp0:0:0:1): READ(6)/WRITE(6) not supported, increasing
>> minimum_cmd_size to 10.
>> (da1:isp0:0:0:1): READ(10). CDB: 28 0 0 0 0 0 0 0 1 0
>> (da1:isp0:0:0:1): CAM Status: SCSI Status Error
>> (da1:isp0:0:0:1): SCSI Status: Check Condition
>> (da1:isp0:0:0:1): ILLEGAL REQUEST asc:94,1
>> (da1:isp0:0:0:1): Vendor Specific ASC
>> (da1:isp0:0:0:1): Unretryable error
>>
>> What's going on here?
>>
>> Is there any config I need to to for volume mapping and/or
>> multipathing? I'm a complete newb when it comes to FC on FreeBSD, so
>> forgive my ignorance.
>>
>> Thanks for the help, guys!
>
> I suspect the reason you have 16 disk devices showing up is that you
> are running multipath. You will get one da device showing up for each
> different path, and if you're running a full multipath environment
> that's likely 4 paths per device, which would lead to the 16 disks
> (unless they're not the sizes you expect, but I would tend to suspect
> its a multipath artifact)
Thanks for pointing out what should have been obvious. The Sun 2540 has 2 ports on 2 controllers and camcontrol shows
exactly that:
# camcontrol devlist
<SUN LCSM100_F 0670> at scbus0 target 0 lun 0 (da0,pass0)
<SUN LCSM100_F 0670> at scbus0 target 0 lun 1 (da1,pass1)
<SUN LCSM100_F 0670> at scbus0 target 0 lun 2 (da2,pass2)
<SUN LCSM100_F 0670> at scbus0 target 0 lun 3 (da3,pass3)
<SUN LCSM100_F 0670> at scbus0 target 1 lun 0 (da4,pass4)
<SUN LCSM100_F 0670> at scbus0 target 1 lun 1 (da5,pass5)
<SUN LCSM100_F 0670> at scbus0 target 1 lun 2 (da6,pass6)
<SUN LCSM100_F 0670> at scbus0 target 1 lun 3 (da7,pass7)
<SUN LCSM100_F 0670> at scbus1 target 0 lun 0 (da8,pass8)
<SUN LCSM100_F 0670> at scbus1 target 0 lun 1 (da9,pass9)
<SUN LCSM100_F 0670> at scbus1 target 0 lun 2 (da10,pass10)
<SUN LCSM100_F 0670> at scbus1 target 0 lun 3 (da11,pass11)
<SUN LCSM100_F 0670> at scbus1 target 1 lun 0 (da12,pass12)
<SUN LCSM100_F 0670> at scbus1 target 1 lun 1 (da13,pass13)
<SUN LCSM100_F 0670> at scbus1 target 1 lun 2 (da14,pass14)
<SUN LCSM100_F 0670> at scbus1 target 1 lun 3 (da15,pass15)
> To handle multipath you probably want to look at gmultipath(8).
>
> I'm not sure about READ/WRITE errors. You say they show up for 8
> devices? Is it possible that the array is not true active/active
> on the controllers? Its possible that half the paths are going to
> a controller that is rejecting the I/O until the LUN fails over,
> but thats just a guess based on the error message. If you can
> look at the controller/bus/target/lun information from dmesg and
> see if you can spot a pattern about the path to the LUNs giving
> the error that may give a better idea about whats going on.
I think you've nailed it. da4-7 & da12-15 have the following respective lines in dmesg:
da[4-7]: 200.000MB/s transfers WWNN 0x200400a0b8388efd WWPN 0x203500a0b8388efd PortID 0x10100
da[12-15]: 200.000MB/s transfers WWNN 0x200400a0b8388efd WWPN 0x202500a0b8388efd PortID 0x10100
The two WWPNs correspond to the 2540's controllers and the write errors are on da0-3 & da8-11. I can't find anything
yet in the docs on making the other ports active, but
I successfully labeled da7 & da15 with gmultipath, although I couldn't add da3 & da11 due to write errors. No real
surprise, but since I can't add the label, what happens if one of the active ports on a controller fails? I know I'd
have the other path to the active port on the other controller, but would I have to manually add the label to the
volumes from newly-active port?
More information about the freebsd-scsi
mailing list