Kernel Panic on Cold Start with Adaptec AIC7XXX Rev. 6.2.4
Justin T. Gibbs
gibbs at btc.adaptec.com
Tue Dec 18 14:58:25 PST 2001
Sorry for the slow response on this. I've been snowed under working
on Linux U320 support...
>I updated from Linux 2.4.8 to 2.4.13 and now get a kernel panic on a
>cold start. This reflects the AIC7XXX driver change from
>Rev 6.2.1 to 6.2.4. After an initial kernel panic, a warm "reset"
>effects a successful boot. My controller is the Adaptec
>3940 Ultra SCSI PCI adapter with two controllers. scsi1 is enabled in
>the controllers BIOS, but is not cabled.
The problem is caused by either the firmware or the kernel reading
a piece of SCB or scratch ram that has not been previously written
to. The driver purposefully does not explicitly write to every piece
of memory prior to initialization. This would mask bugs in the
initialization code - all referenced locations should be initialized
to their correct values. This works great so long as I can reproduce
the parity error here. Unfortunately, even with lots of different
controllers in lots of different machines, I have not been able to do
so. I've also spent some time reviewing the code changes between 6.2.1
and 6.2.4 and have not found a "smoking gun".
Since you can reproduce this, I'm hoping you can help in tracking this
down. By performing a "binary search" on the two types of memory, we
should be able to figure out the location that is causing the problem.
Once we have the offset, determining why that location is referenced
prior to being initialized should be pretty easy.
You'll need to modify two pieces of code:
1) drivers/scsi/aic7xxx/aic7xxx.c:ahc_reset(), add these lines
to the bottom of the function:
for (wait = BUSY_TARGETS; wait <= SEQ_FLAGS2; wait++)
ahc_outb(ahc, wait, 0);
If you still can't cold boot your system, remove this code and
try step two below.
We now have to narrow down which location within the above range is
causing the problem. You should be able to do a binary search on
the location (e.g. half the amount initialized, determine which half
is at fault, then recurse on the half with the problem).
2) drivers/scsi/aic7xxx/aic7xxx.c:ahc_probe_scbs(), do something like this:
/*
* Determine the number of SCBs available on the controller
*/
int
ahc_probe_scbs(struct ahc_softc *ahc) {
int i;
for (i = 0; i < AHC_SCB_MAX; i++) {
int j;
ahc_outb(ahc, SCBPTR, i);
ahc_outb(ahc, SCB_BASE, i);
if (ahc_inb(ahc, SCB_BASE) != i)
break;
/* Added code */
for (j = 1, j < 32; j++)
ahc_outb(ahc, SCB_BASE, 0);
/* End added code. */
ahc_outb(ahc, SCBPTR, 0);
if (ahc_inb(ahc, SCB_BASE) != 0)
break;
}
return (i);
}
Perform a similar binary search on the SCB memory until you determine
which position is at fault.
--
Justin
To Unsubscribe: send mail to majordomo at FreeBSD.org
with "unsubscribe aic7xxx" in the body of the message
More information about the aic7xxx
mailing list