Critical bug in Adaptec(aac) driver ...

Scott Long scott_long at btc.adaptec.com
Sun Jun 1 10:38:45 PDT 2003


Marc G. Fournier wrote:
> As those on this list will have seen over the past few months, I have a
> server that had (past tense) an Adaptec 2120s controller in her that was
> giving alot of grief ... about 3 weeks ago, the server it was in *really*
> blew up ... one drive was reported as down (in a RAID5 array), and when we
> tried to bring it back up, a second drive started to "fail" ... I got the
> techs to shut her down, and literally rushed to the remote location to see
> if there was anything that I could do to at least recover the data ...
> 
> When I got there to bring it back up, the server reported that a 3rd drive
> had failed ... and within a few hours, a 4th drive failed ... the result
> being that we lost all of the data on that server, which turned out to be
> quite painful to recover ...
> 
> While down there, we replaced the Adaptec controller with an Intel one,
> reformatted the exact same drives, in the exact same chassis, and she's
> been running fine since ...
> 
> On my trip back, I had a chat with a friend that does development work in
> the Linux world, and who had had that server previous to myself, and
> apparently there is a "known bug" in Linux that he says sounds exactly
> like what I experienced (they hit it right in the middle of developing on
> that box) and that there are apparently two Linux kernel patches that they
> had to apply (after rebuilding from scratch) to correct the problem ...
> 
> The way he explained the problem to me, he made it sound like the kernel
> driver was interacting with the BIOs and causing some corruption ... not
> sure at what level, but since trying to swap in a new controller didn't
> restore things, I'm suspecting at the hard drive level ... ?
> 
> Scott, while down there, I tried just about everything I could think to
> ... we replaced the SCSI cable, put the drives/controller into a second
> identical chassis, swap host controller cards themselves (I had brought
> spares) ... and that server, as I mentioned, is currently running quite
> happily with an Intel host controller in it :(  So, unless the same
> "failure" was hitting two host controllers, hardware failure doesn't seem
> to have been the cause ...
> 

I understand your frustration and wish there was more I could do to 
help.  Please send me whatever information that you have.

Scott



More information about the freebsd-scsi mailing list