AIC7XXX (2940UW Pro) file system corruption

Scott Long scottl at freebsd.org
Wed Feb 11 19:40:54 PST 2004


Matthias Andree wrote:
> Hi,
> 
> I have a 2940 UW Pro running in a FreeBSD 4-STABLE (checked out and
> built kernel around Feb 3rd) machine with Yamaha CRW4416S (CD, USCSI),
> Plextor PX-20TS (CD, USCSI) and Micropolis 4345WS (HDD). The external
> connector is unused, the 50-pin stuff is terminated internally in the
> Plextor at the bus end, the 68-pin stuff is terminated internally in the
> Micropolis at the other bus end.
> 
> Last Friday, the SCSI stuff in the box went haywire, dumped card state
> and finally locked the machine up - I had to press the reset button. On
> reboot, fsck -p aborted the boot since /var was corrupt. At that time,
> the hard disk drive was running with the "WCE" set to 0 in the saved and
> current mode pages. It's a test machine, so I didn't bother to report
> this yet.
> 
> I'd used both a Tekram DC-390 (AMD53C974, amd(4)) and a Tekram DC-390U
> (SYM53C975, sym(4)) in the same machine with one of these 50<->68
> adaptor plugs without seeing such problems, but at that time, the Yamaha
> was missing.
> 

The amd(4) driver doesn't do tagged queueing, so it certainly will not
work the disk as hard as the ahc driver will.  I don't know much about
the sym(4) driver, but it might not work the disk as hard either.

> The log entries (logged across the network) are too large to post here,
> download URL (the log is gzipped):
> 
> ftp://ftp.dt.e-technik.uni-dortmund.de/pub/people/ma/aic7xxx-hang.gz
> 
> The log is segmented, the first part of Feb 6 is the boot-up message
> (around 15:07), then I elided logs until 20:00, where the trouble
> started at 20:10:20 with
> Feb  6 20:10:20 libertas /kernel: swap_pager: indefinite wait buffer: device: #da/0x20001, blkno: 4296, size: 24576
> Feb  6 20:10:41 libertas /kernel: (da0:ahc0:0:0:0): SCB 0x0 - timed out
> Feb  6 20:10:42 libertas /kernel: >>>>>>>>>>>>>>>>>> Dump Card State Begins <<<<<<<<<<<<<<<<<
> Feb  6 20:10:42 libertas /kernel: ahc0: Dumping Card State in Data-in phase, at SEQADDR 0x64
> 
> At that time, the machine was running portupgrade -a and supposed to
> build some big stuff, gcc, XFree and other, from ports.
> 
> Is this a driver issue?

There is a very good chance that your disk is going bad.  The dump card
states are due to the disk not responding to commands within the timeout
period.

> 
> After reboot and manually cleaning up the /var mess which involved force
> installing some ports, another portupgrade -a has completed without
> problem.
> 
> 
> I tried reading the defect lists with either of these following
> commands, to no other avail than an error message card state dumps again
> (not posted, only first and last lines below)
> 
>    camcontrol defects da0 -G -f block
>    camcontrol defects da0 -G -f bfi
>    camcontrol defects da0 -G -f phys
> 
> (pass0:ahc0:0:0:0): SCB 0xf - timed out
> 
>>>>>>>>>>>>>>>>>>>Dump Card State Begins <<<<<<<<<<<<<<<<<
> 
> ahc0: Dumping Card State while idle, at SEQADDR 0x7
> Card was paused
> ...
> (pass0:ahc0:0:0:0): Queuing a BDR SCB
> (pass0:ahc0:0:0:0): Bus Device Reset Message Sent
> (pass0:ahc0:0:0:0): no longer in timeout, status = 34b
> ahc0: Bus Device Reset on A:0. 11 SCBs aborted
> 
> This card dump occurred within half a second after issuing the
> camcontrol command.
> 
> I have then, as an alternative, run "sformat -verify dev=0,0,0", which
> has not reported any defects or weak blocks or something, so I can
> assume the drive is fine.
> 

While the disk platters might not be going bad, the electronics might be
aging and getting more sensitive to the heat.  Make sure that the disk
is well vented and cooled for now, and look into replacing it soon.

Scott



More information about the freebsd-scsi mailing list