Hang on boot in isp with QLA2342 after upgrading to 6.3
Graham Allan
allan at physics.umn.edu
Mon May 12 17:14:06 UTC 2008
On Mon, May 12, 2008 at 12:19:49PM -0400, Alexander Sack wrote:
>
> Graham, from the driver error messages it seems that the card believes
> you are on a switched fabric and that it most likely is logging into
> the SNS server to lookup names/addresses for your devices. Are you
> sure that your switched fabric is setup correctly? I missed part of
> this thread so I apologize if this topic has already been hashed out.
> If for some reason the host can not log into the SNS server and
> retrieve entries from the database, then you are going to be hosed (I
> agree the OS shouldn't be hung unless you are booting off the disk
> connected to the failed controller, etc.).
>
> I am very familiar with the ISP23/4xx chipset and I go digging more
> but I was wondering if you have verified that your topology is valid.
I'm happy to confess to being a SAN novice, so I'm not quite sure how I
would verify that, other than that it "seems to work" ok on the older
OS release, and also in specific circumstances on the current one - for
example, if one port of the HBA is connected directly to a device, and
the other to the fabric, it doesn't have a problem - so in that
situation it is able to log in to the fabric ok and retrieve database
information.
Even when it does hang, it does appear to have logged in to the fabric
ok, according to my interpretation of the switch output:
fcswitch_s43_2:admin> portshow 8
portName:
portHealth: No License
Authentication: None
portFlags: 0x223805b portLbMod: 0x0 PRESENT ACTIVE F_PORT G_PORT U_PORT LOGIN NOELP LED ACCEPT WAS_EPORT
portType: 4.1
portState: 1 Online
portPhys: 6 In_Sync
portScn: 6 F_Port
portRegs: 0x81100000
portData: 0x11deb230
portId: 031800
portWwn: 20:08:00:60:69:51:4a:20
portWwn of device(s) connected: 21:00:00:e0:8b:08:06:d2
Distance: normal
Speed: N2Gbps
Interrupts: 20487 Link_failure: 18 Frjt: 0
Unknown: 404 Loss_of_sync: 12295 Fbsy: 0
Lli: 13715 Loss_of_sig: 93
Proc_rqrd: 6646 Protocol_err: 0
Timed_out: 0 Invalid_word: 0
Rx_flushed: 0 Invalid_crc: 0
Tx_unavail: 0 Delim_err: 0
Free_buffer: 0 Address_err: 0
Overrun: 0 Lr_in: 36
Suspended: 0 Lr_out: 73
Parity_err: 0 Ols_in: 73
and it's listed in the switch name server (third entry down, 031800):
fcswitch_s43_2:admin> nsshow
{
Type Pid COS PortName NodeName TTL(sec)
N 031300; 3;21:00:00:04:d9:60:17:6e;20:00:00:04:d9:60:17:6d; na
FC4s: FCP
PortSymb: [39] "UNKNOWN A.0 UNKNOWN FW:01.02 Port 1 "
Fabric Port Name: 20:03:00:60:69:51:4a:20
N 031500; 3;21:00:00:1b:4d:00:83:ed;20:00:00:1b:4d:00:83:ec; na
FC4s: FCP [JetStor FreeBSD mark R4 R001]
Fabric Port Name: 20:05:00:60:69:51:4a:20
N 031800; 3;21:00:00:e0:8b:08:06:d2;20:00:00:e0:8b:08:06:d2; na
FC4s: FCP
Fabric Port Name: 20:08:00:60:69:51:4a:20
N 031900; 3;10:00:00:06:2b:09:4f:d8;20:00:00:06:2b:09:4f:d8; na
FC4s: FCIP FCP
PortSymb: [47] "LSI7202P B.0 03-01001-02A FW:1.00.06 Port 0 "
Fabric Port Name: 20:09:00:60:69:51:4a:20
N 031a00; 2,3;10:00:00:00:c9:24:5b:04;20:00:00:00:c9:24:5b:04; na
FC4s: FCP
PortSymb: [49] "UNIX (emx2) KGPSA-CA S/W Rev 2.25: F/W Rev 3.93a0"
Fabric Port Name: 20:0a:00:60:69:51:4a:20
The Local Name Server has 5 entries }
It has been pointed out to me that this kind of weird interaction isn't
exactly unknown in the SAN world, and setting up zoning on the switch
would probably make it go away. So I will also try that (it's probably
a giveway of a SAN novice that I hadn't already done so - it certainly
does sound like it would help). But if the hang does point to a problem
in the driver, I'm also happy to keep trying different things in the
hope of revealing where the problem actually lies.
Graham
More information about the freebsd-scsi
mailing list