What's Going On Here?

Duncan Sterling buffalo at radix.net
Wed May 12 06:02:40 PDT 2004


Greetings,

I'd been having problems recently with a stock Redhat 7.2 and an adaptec
aic7892.

The machine has been recently locking up randomly with EXT3 errors, but
with a reboot the machine would come up fine; both drives (SEAGATE Model:
ST39103LW Rev: 0002) pass their SMART tests.

After a recent kernel upgrade which also updated the SCSI driver to
version 6.2.8, the machine no longer locks up, but now produces the
following error message without locking up the machine:

--------------------------------
SCSI disk error : host 0 channel 0 id 0 lun 0 return code = 10000
 I/O error: dev 08:01, sector 13380296
SCSI disk error : host 0 channel 0 id 0 lun 0 return code = 10000
 I/O error: dev 08:01, sector 721912
scsi0:0:0:0: Attempting to queue an ABORT message
scsi0: Dumping Card State while idle, at SEQADDR 0x8
ACCUM = 0x2e, SINDEX = 0x48, DINDEX = 0xe4, ARG_2 = 0xb
HCNT = 0x0 SCBPTR = 0x3
SCSISEQ = 0x12, SBLKCTL = 0xa
 DFCNTRL = 0x0, DFSTATUS = 0x89
LASTPHASE = 0x1, SCSISIGI = 0x0, SXFRCTL0 = 0x80
SSTAT0 = 0x0, SSTAT1 = 0x0
SCSIPHASE = 0x0
STACK == 0x3, 0x175, 0x160, 0x0
SCB count = 68
Kernel NEXTQSCB = 26
Card NEXTQSCB = 26
QINFIFO entries:
Waiting Queue entries:
Disconnected Queue entries: 9:33
QOUTFIFO entries:
Sequencer Free SCB List: 3 6 21 14 2 10 7 16 29 11 28 19 17 13 31 22 24 4
8 15 1 18 27 26 5 0 20 30 23 12 25
Sequencer SCB Info: 0(c 0x60, s 0x17, l 0, t 0xff) 1(c 0x60, s 0x17, l 0,
t 0xff0, t 0xff) 5(c 0x60, s 0x7, l 0, t 0xff) 6(c 0x60, s 0x7, l 0, t 0xff) 7(c
0x60, s 0x7, l 0, t 0xff) 8(c 0x60, s 0x17, l 0, t 0xff) 9(c 0x64, s 0x7,
l 0, t 0x21) 10(c 0x60, s 0x7, l 0, t 0xff) 11(c 0x60, s 0x7, l 0, t 0xff)
12(c 0x60, s 0x17, l 0, t 0xff) 13(c 0x60, s 0x7, l 0, t 0xff) 14(c 0x60,
s 0x7, l 0, t 0xff) 15(c 0x60, s 0x7, l 0, t 0xff) 16(c 0x60, s 0x7, l 0,
t 0xff) 17(c 0x60, s 0x7, l 0, t 0xff) 18(c 0x60, s 0x17, l 0, t 0xff)
19(c 0x60, s 0x7, l 0, t 0xff) 20(c 0x60, s 0x7, l 0, t 0xff) 21(c 0x60, s
0x7, l 0, t 0xff) 22(c 0x60, s 0x7, l 0, t 0xff) 23(c 0x60, s 0x7, l 0, t
0xff) 24(c 0x60, s 0x17, l 0, t 0xff) 25(c 0x60, s 0x17, l 0, t 0xff) 26(c
0x60, s 0x17, l 0, t 0xff) 27(c 0x60, s 0x7, l 0, t 0xff) 28(c 0x60, s
0x7, l 0, t 0xff) 29(c 0x60, s 0x7, l 0, t 0xff) 30(c 0x60, s 0x17, l 0, t
0xff) 31(c 0x60, s 0x7, l 0, t 0xff)
Pending list: 33(c 0x60, s 0x7, l 0)
Kernel Free SCB list: 46 35 30 0 18 1 3 59 16 52 39 19 36 14 60 51 2 6 20
32 34 4 48 63 45 8 47 67 40 15 49 53 11 56 23 57 54 7 10 61 58 22 28 17 31
44 5 55 37 50 21 25 62 27 43 38 12 24 9 13 42 29 41 66 65 64
DevQ(0:0:0): 0 waiting
DevQ(0:1:0): 0 waiting
DevQ(0:4:0): 0 waiting
(scsi0:A:0:0): Queuing a recovery SCB
scsi0:0:0:0: Device is disconnected, re-queuing SCB
Recovery code sleeping
Recovery SCB completes
Recovery code awake
aic7xxx_abort returns 0x2002
scsi0:0:0:0: Attempting to queue a TARGET RESET message
scsi0:0:0:0: Command not found
aic7xxx_dev_reset returns 0x2002
----------------------------------

I know it would appear that this is a problem with drive 0:0:0:0, but
again it passes SMART tests.

An attempt to copy the 0:0:0:0 drive result and then trying to boot the
machine from it (prior to the kernel/driver upgrade) fail; yet the copied
drive will boot on other machines and come up fine--does this imply
possibly a cable or adapter problem?

Can anything useful be pulled out of the above diagnostic message?

TIA for any/all pointers/advice/suggestions,

--Duncan



More information about the aic7xxx mailing list