Puzzle for Doug...
Robert G. Brown
rgb at phy.duke.edu
Tue Jul 28 12:25:54 PDT 1998
On Tue, 28 Jul 1998, Mike Isely wrote:
> Well since the aic7xxx hardware executes DMA on its own behalf, that sort
> of memory access might look "different" enough to the hardware to expose a
> latent race condition. Certainly there's more memory contention going on
> with the aic7xxx stuff in the picture.
Good point. I also am wondering if the high speed of the CPU's, the
memory and the U2 controller itself combine to reveal a race
condition. I just really believe that the race is in the driver.
> Such memory tests never amount to more than a quickie existence check.
> "Leaky" DRAM cells (if such a thing could happen) can't be picked up
> for example because it would take many many microseconds for the bit(s) to
> go bad. BIOS memory scans run way too fast for that.
Again, if it were "raw" bad DRAM, the system simply wouldn't work
regardless of the presence/absence of the aic7xxx driver. Something
else would be using the critical memory during boot and fail. I like
your DMA/race/contention hypothesis below much better.
>
> >
> > The only way that I could see the problem being bad memory is if the
> > SDRAM they put in the systems is somehow marginal and occasionally
> > fails but ONLY IN A WAY THE AIC7XXX DRIVER TWEAKS! And only on the
>
> Without any DMA devices active in the system, the memory activity is going
> to be limited to whatever the CPU causes. Is there any known-DMA going on
> without the aic7xxx running? With multiple independant (fast) devices
> initiating memory access, all sorts of contention issues can arise. Of
> course, this is supposed to work, but without the aic7xxx stuff active you
> might not be beating on it hard enough to cause the trouble. Remember the
> RZ1000 IDE problem a few years back?
Yeah, this occurred to me -- I have an eepro100 in the system and
there is indeed network traffic, especially during diskless boots.
It's harder to see this as a problem in NON-diskless boots, though.
Also, the network device is formally probed and initialized only AFTER
the scsi device. Finally, I unplugged the cable during a boot or two
so that it wan't actually receiving packets during boot. No effect.
Still, a definite possibility.
>
> Just fishing for ideas for ya. I think a game of musical hardware is
> definitely the next step here. But even that may not give conclusive
> results if something in Dell's configuration is "right on the edge".
And I appreciate it! But *moan*...
rgb
Robert G. Brown http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu
To Unsubscribe: send mail to majordomo at FreeBSD.org
with "unsubscribe aic7xxx" in the body of the message
More information about the aic7xxx
mailing list