isp(4) - kernel panic on initialization of driver

Wed Aug 27 15:12:14 UTC 2008

On Tue, Aug 26, 2008 at 4:41 PM, Ross <westr at connection.ca> wrote:
> I've been tracking down a problem that is sometimes causing a kernel
> panic to occur when initializing the isp driver in the system. (System
> in question is a HP Blade - BL460c w/ QHM 6432 FC dual port card
> reporting as the following:
>
> isp0: <Qlogic ISP 2432 PCI FC-AL Adapter> port 0x4000-0x40ff mem 0xfdff0000-0xfdff3fff irq 18 at device 0.0 on pci16
> isp0: Board Type 2422, Chip Revision 0x2, resident F/W Revision 4.0.90
> isp1: <Qlogic ISP 2432 PCI FC-AL Adapter> port 0x4400-0x44ff mem 0xfdfe0000-0xfdfe3fff irq 19 at device 0.1 on pci16
> isp1: Board Type 2422, Chip Revision 0x2, resident F/W Revision 4.0.90

So one question I have is why doesn't the isp driver load the firmware in ispfw?

This is 7.x, so firmware_get() should have returned the isp_2400
registered firmware image for a 2422 card and loaded it it in
isp_reset() unless dodnld was set to zero from a hint flag.  By any
chance do you have "hint.isp.0.fwload_disable" set?  I'm not saying
that will fix your problem but I just noticed this.

> We're doing a boot-via-san situation, and the issue looks to be that
> the card is receiving a ISPASYNC_CHANGE_PDB command on isp1 before
> it's ready for it. I'm guessing it's due to the fact the card already
> as the firmware loaded and active (due to the boot).

Nah, I don't think that's it exactly. i.e. whether or not isp/ispfw
loads the firmware on boot, I think you can still see this issue.

It looks like after isp_reset() is performed which grabs information
from the ISP and normally attempts to reset it and do further hardware
initialization we get into isp_init().  The isp_init() function
attempts to issue MBOX_SET_FIRMWARE_OPTIONS command which will
generate an Asynchronous event when a LIP is received.

At this point the ISP_LOCK is held (which blocks any ISR at this
point).  However I see in isp_attach() we drop it which I will bet is
when chaos ensues (isr proceeds, obtains the lock and goes through the
async event path).

WARNING: this is pure speculation on my part from quickly looking at
it.  I'm just wondering if this is a bug in isp allowing the ASYNC
events occuring before a complete attach has been performed.

> Console debug (hint.isp.[01].debug=0x11f) output looks like the following on a crash:
>
> -=
> kernel: isp1: <Qlogic ISP 2432 PCI FC-AL Adapter> port 0x4400-0x44ff mem 0xfdfe0000-0xfdfe3fff irq 19 at device 0.1 on pci16
> kernel: isp1: set PCI latency to 64
> kernel: isp1: [ITHREAD]
> kernel: isp1: line 5345: markportdb
> kernel: isp1: Port Database Changed
> kernel: isp1: Port Database Changed: freeze simq (loopdown)
> [crash]
> -=
>
> Further debugging shows that isp_freeze_loopdown() function that is
> called at the above point never returns. Quick guess is the called
> xpt_freeze_simq() function [line 290 in isp_freebsd.c] is the culprit,
> but that's about the limit of my ability for tracking this down.

Yes but its not clear to me the bug is REALLY in CAM...yet.  It might
be ISP letting asynchronous events in before its really ready which
means its a driver problem.  I have a 24xx card in the lab, maybe I
can test this if I get a chance too.  I'm assuming all you are doing
is booting 7.0-RELEASE off the SAN?

As Scott asked, can you get a dump or a trace of the crash?

Thanks!

-aps