"read defect list" with 2.0.30-pre7 and patch Aug19
Ulrich Windl
ulrich.windl at rz.uni-regensburg.de
Sun Aug 24 23:26:37 PDT 1997
On 23 Aug 97 at 15:09, Doug Ledford wrote:
Doug,
thank you for analyzing that problem. In fact I also tried the Aug21
patch without success, but with some more information. Probably you
got that information yourself already.
The new SCSI statistics look much better.
To your patch: If changing things like below, you might also add a
comment (inside the patch) saying why you changed it. It will help
anybody outside this discussion group.
> --- scsi_ioctl.c Mon Aug 11 21:00:52 1997
> +++ scsi_ioctl.c.new Sat Aug 23 15:02:57 1997
> @@ -10,4 +10,5 @@
> #include <asm/segment.h>
> #include <asm/system.h>
> +#include <asm/page.h>
>
> #include <linux/errno.h>
> @@ -29,5 +30,5 @@
> #define READ_ELEMENT_STATUS_TIMEOUT (5 * 60 * HZ)
>
> -#define MAX_BUF 4096
> +#define MAX_BUF PAGE_SIZE
>
> #define max(a,b) (((a) > (b)) ? (a) : (b))
> @@ -206,6 +207,6 @@
> * should use scsi_generics instead.
> */
> - if( inlen > MAX_BUF ) inlen = MAX_BUF;
> - if( outlen > MAX_BUF ) outlen = MAX_BUF;
> + if( inlen > MAX_BUF ) return -EINVAL;
> + if( outlen > MAX_BUF ) return -EINVAL;
>
> cmd_in = (char *) ( ((int *)buffer) + 2);
>
Here's my latest trace file:
-------------
Sad to say: Nothing new!
Driver output during boot:
<4>aic7xxx: <Adaptec AHA-294X SCSI host adapter> at PCI 12
<6>aic7xxx: Loading serial EEPROM...done
<6>aic7xxx: Warning - detected auto-termination. Please verify driver
<6> detected settings and use manual termination if necessary.
<6>aic7xxx: Cables present (Int-50 YES, Ext-50 NO)
<6>aic7xxx: eprom not present, brdctl_int=0xac, brdctl_ext=0x6c
<6>aic7xxx: Termination ON
<6>aic7xxx: BIOS enabled, IO Port 0xe000, IO Mem 0xf9800000, IRQ 11
<6>aic7xxx: Extended translation enabled.
<6>aic7xxx: Single Channel, SCSI ID 7, 16/255 SCBs, QFull 16, QMask 0x1f
<6>aic7xxx: Downloading sequencer code... 414 instructions downloaded
<4>scsi0 : Adaptec AHA274x/284x/294x (EISA/VLB/PCI-Fast SCSI) 4.1/3.2
<4>scsi : 1 host.
<6>scsi0: Scanning channel A for devices.
<4>scsi0: Target 0, channel A, now synchronous at 10.0MHz, offset 15.
<4> Vendor: CONNER Model: CFP1080S Rev: 4649
<4> Type: Direct-Access ANSI SCSI revision: 02
<4>Detected scsi disk sda at scsi0, channel 0, id 0, lun 0
<4>scsi0: Target 1, channel A, now synchronous at 4.0MHz, offset 15.
<4> Vendor: TOSHIBA Model: CD-ROM XM-5301TA Rev: 1895
<4> Type: CD-ROM ANSI SCSI revision: 02
<4>Detected scsi CD-ROM sr0 at scsi0, channel 0, id 1, lun 0
<4>scsi0: Target 2, channel A, now synchronous at 5.0MHz, offset 8.
<4> Vendor: HP Model: HP35480A Rev: T603
<4> Type: Sequential-Access ANSI SCSI revision: 02
<6>Detected scsi tape st0 at scsi0, channel 0, id 2, lun 0
<6>(scsi0:0:0) Enabled tagged queuing, queue depth 8.
<4>scsi : detected 1 SCSI tape 1 SCSI cdrom 1 SCSI disk total.
<4>SCSI device sda: hdwr sector= 512 bytes. Sectors= 2110812 [1030 MB] [1.0 GB]
<4>Partition check:
<4> sda: sda1 < sda5 sda6 >
<4> hda: hda1 hda2 hda3
<4>VFS: Mounted root (ext2 filesystem) readonly.
Status file after some activity:
Adaptec AIC7xxx driver version: 4.1/3.2
Compile Options:
AIC7XXX_RESET_DELAY : 5
AIC7XXX_TAGGED_QUEUEING: Enabled
AIC7XXX_PAGE_ENABLE : Enabled
AIC7XXX_PROC_STATS : Enabled
Adapter Configuration:
SCSI Adapter: Adaptec AHA-294X SCSI host adapter
(AIC-787x chipset)
Host Bus: Single
Base IO: 0xe000
Base IO Memory: 0xf9800000
IRQ: 11
SCBs: Used 8, HW 16, Page 255
Interrupts: 4657
Serial EEPROM: True
Extended Translation: Enabled
SCSI Bus Reset: Enabled
Ultra SCSI: Disabled
Target Disconnect: Enabled
Statistics:
CHAN#A (TGT 0 LUN 0):
nxfers 2806 (2430 read;376 written)
blks(512) rd=29207; blks(512) wr=2184
< 512 512-1K 1-2K 2-4K 4-8K 8-16K 16-32K 32-64K 64-128K >128K
Reads: 0 4 7 902 640 720 120 22 15 0
Writes: 0 0 0 292 76 5 1 2 0 0
CHAN#A (TGT 1 LUN 0):
nxfers 1774 (1774 read;0 written)
blks(512) rd=22320; blks(512) wr=0
< 512 512-1K 1-2K 2-4K 4-8K 8-16K 16-32K 32-64K 64-128K >128K
Reads: 0 0 0 1226 148 71 329 0 0 0
Writes: 0 0 0 0 0 0 0 0 0 0
Configuration Info:
CONFIG_MODULES=y
CONFIG_MODVERSIONS=y
CONFIG_NET=y
CONFIG_PCI=y
CONFIG_SYSVIPC=y
CONFIG_BINFMT_AOUT=y
CONFIG_BINFMT_ELF=y
CONFIG_KERNEL_ELF=y
CONFIG_M586=y
CONFIG_BLK_DEV_FD=y
CONFIG_BLK_DEV_IDE=y
CONFIG_BLK_DEV_TRITON=y
CONFIG_BLK_DEV_RAM=y
CONFIG_INET=y
CONFIG_IP_MULTICAST=y
CONFIG_IP_NOSR=y
CONFIG_SKB_LARGE=y
CONFIG_SCSI=y
CONFIG_BLK_DEV_SD=y
CONFIG_CHR_DEV_ST=y
CONFIG_BLK_DEV_SR=y
CONFIG_SCSI_MULTI_LUN=y
CONFIG_SCSI_CONSTANTS=y
CONFIG_SCSI_AIC7XXX=y
CONFIG_AIC7XXX_TAGGED_QUEUEING=y
CONFIG_AIC7XXX_PAGE_ENABLE=y
CONFIG_AIC7XXX_PROC_STATS=y
CONFIG_AIC7XXX_RESET_DELAY=5
CONFIG_NETDEVICES=y
CONFIG_DUMMY=y
CONFIG_MINIX_FS=y
CONFIG_EXT2_FS=y
CONFIG_FAT_FS=y
CONFIG_MSDOS_FS=y
CONFIG_VFAT_FS=y
CONFIG_UMSDOS_FS=y
CONFIG_PROC_FS=y
CONFIG_ISO9660_FS=y
CONFIG_HPFS_FS=y
CONFIG_SERIAL=y
CONFIG_PRINTER=y
CONFIG_MOUSE=y
CONFIG_PSMOUSE=y
CONFIG_RTC=y
CONFIG_SOUND=y
Crash #1:
Aug 22 22:51:00 elf kernel: (scsi0:0:0) Data overrun of 16773218 bytes detected in Data-In phase, tag 0; forcing a retry.
Aug 22 22:51:00 elf kernel: Have seen Data Phase. Length=4096, NumSGs=1.
Aug 22 22:51:00 elf kernel: sg[0] - Addr 0xb000 : Length 4096
Aug 22 22:51:00 elf kernel: (scsi0:0:0) Data overrun of 16773218 bytes detected in Data-In phase, tag 0; forcing a retry.
Aug 22 22:51:00 elf kernel: Have seen Data Phase. Length=4096, NumSGs=1.
Aug 22 22:51:00 elf kernel: sg[0] - Addr 0xb000 : Length 4096
Aug 22 22:51:00 elf kernel: scsi0 channel 0 : resetting for second half of retries.
Aug 22 22:51:00 elf kernel: SCSI bus is being reset for host 0 channel 0.
Aug 22 22:51:00 elf kernel: (scsi0:0:0) Reset called, no scb, flags 0x1
Aug 22 22:51:00 elf kernel: (scsi0:-1:0) Channel reset
Aug 22 22:51:00 elf kernel: Unable to handle kernel paging request at virtual address c5e7024b
Aug 22 22:51:00 elf kernel: current->tss.cr3 = 00101000, hr3 = 00101000
Aug 22 22:51:00 elf kernel: *pde = 00000000
Aug 22 22:51:00 elf kernel: Oops: 0002
Aug 22 22:51:00 elf kernel: CPU: 0
Aug 22 22:51:00 elf kernel: EIP: 0010:[scsi_mark_host_reset+15/28]
Aug 22 22:51:00 elf kernel: EFLAGS: 00010006
Aug 22 22:51:00 elf kernel: eax: 05e70200 ebx: 00000202 ecx: 01ab5f24 edx: 0000c018
Aug 22 22:51:00 elf kernel: esi: 00090018 edi: 0000c410 ebp: 00000001 esp: 001df0dc
Aug 22 22:51:00 elf kernel: ds: 0018 es: 0018 fs: 002b gs: 0018 ss: 0018
Aug 22 22:51:00 elf kernel: Process swapper (pid: 0, process nr: 0, stackpage=001dd274)
Aug 22 22:51:00 elf kernel: Stack: 0019a5fb 00090018 00000001 0000c410 00000000 00000027 0019a02e 0000c410
Aug 22 22:51:00 elf kernel: 00000001 001d1dfd 00000000 00000000 00000046 0000d018 00090068 0000e018
Aug 22 22:51:00 elf kernel: 0000d018 00000001 0000e018 00070000 00090018 001a6621 0000c410 0009e1f8
Aug 22 22:51:00 elf kernel: Call Trace: [scsi_reset+399/776] [scsi_done+1162/1672] [aic7xxx_isr+1117/1424] [aic7xxx_isr+828/1424] [do_IRQ+45/80] [IRQ11_interrupt+95/144] [schedule+420/652]
Aug 22 22:51:00 elf kernel: [sys_idle+92/112] [sys_idle+108/112] [system_call+85/128] [init+0/656] [start_kernel+429/440] [it_real_fn+0/72] [schedule+564/652]
Aug 22 22:51:00 elf kernel: Code: 80 48 4b c0 8b 52 10 85 d2 75 f2 c3 90 8b 44 24 04 8b 4c 24
Aug 22 22:51:00 elf kernel: Aiee, killing interrupt handler
Aug 22 22:51:00 elf kernel: kfree of non-kmalloced memory: 001df2bc, next= 001f3bef, order=1128997
Aug 22 22:51:00 elf kernel: kfree of non-kmalloced memory: 001df2ac, next= 001c5c3a, order=1962668
Aug 22 22:51:00 elf kernel: kfree of non-kmalloced memory: 001df7c0, next= 001c5c3a, order=1962668
Aug 22 22:51:00 elf kernel: idle task may not sleep
Crash #2:
Aug 22 23:24:46 elf kernel: (scsi0:0:0) Data overrun of 16773218 bytes detected in Data-In phase, tag 6; forcing a retry.
Aug 22 23:24:46 elf kernel: Have seen Data Phase. Length=4096, NumSGs=1.
Aug 22 23:24:46 elf kernel: sg[0] - Addr 0xb000 : Length 4096
Aug 22 23:24:47 elf kernel: (scsi0:0:0) Data overrun of 16773218 bytes detected in Data-In phase, tag 6; forcing a retry.
Aug 22 23:24:47 elf kernel: Have seen Data Phase. Length=4096, NumSGs=1.
Aug 22 23:24:47 elf kernel: sg[0] - Addr 0xb000 : Length 4096
Aug 22 23:24:47 elf kernel: scsi0 channel 0 : resetting for second half of retries.
Aug 22 23:24:47 elf kernel: SCSI bus is being reset for host 0 channel 0.
Aug 22 23:24:47 elf kernel: (scsi0:0:0) Reset called, no scb, flags 0x1
Aug 22 23:24:47 elf kernel: (scsi0:-1:0) Channel reset
Aug 22 23:24:47 elf kernel: Unable to handle kernel paging request at virtual address c5e7024b
Aug 22 23:24:47 elf kernel: current->tss.cr3 = 00101000, Xr3 = 00101000
Aug 22 23:24:47 elf kernel: *pde = 00000000
Aug 22 23:24:47 elf kernel: Oops: 0002
Aug 22 23:24:47 elf kernel: CPU: 0
Aug 22 23:24:47 elf kernel: EIP: 0010:[scsi_mark_host_reset+15/28]
Aug 22 23:24:47 elf kernel: EFLAGS: 00010006
Aug 22 23:24:47 elf kernel: eax: 05e70200 ebx: 00000202 ecx: 0075af24 edx: 0000c018
Aug 22 23:24:47 elf kernel: esi: 00090018 edi: 0000c410 ebp: 00000001 esp: 001df0dc
Aug 22 23:24:47 elf kernel: ds: 0018 es: 0018 fs: 002b gs: 0018 ss: 0018
Aug 22 23:24:47 elf kernel: Process swapper (pid: 0, process nr: 0, stackpage=001dd274)
Aug 22 23:24:47 elf kernel: Stack: 0019a5fb 00090018 00000001 0000c410 00000000 00000027 0019a02e 0000c410
Aug 22 23:24:47 elf kernel: 00000001 001d1dfd 00000000 00000000 00000046 0177130c 00090068 0000e0d8
Aug 22 23:24:47 elf kernel: 0177130c 00000001 0000e0d8 00070000 00090018 001a6621 0000c410 0009e1f8
Aug 22 23:24:47 elf kernel: Call Trace: [scsi_reset+399/776] [scsi_done+1162/1672] [aic7xxx_isr+1117/1424] [add_timer_randomness+132/380] [aic7xxx_isr+828/1424] [do_IRQ+45/80] [IRQ11_interrupt+95/144
]
Aug 22 23:24:47 elf kernel: [schedule+552/652] [sys_idle+92/112] [sys_idle+108/112] [system_call+85/128] [init+0/656] [start_kernel+429/440] [it_real_fn+0/72] [schedule+564/652]
Aug 22 23:24:47 elf kernel: Code: 80 48 4b c0 8b 52 10 85 d2 75 f2 c3 90 8b 44 24 04 8b 4c 24
Aug 22 23:24:47 elf kernel: Aiee, killing interrupt handler
Aug 22 23:24:47 elf kernel: kfree of non-kmalloced memory: 001df2bc, next= 001f3bef, order=1128997
Aug 22 23:24:47 elf kernel: kfree of non-kmalloced memory: 001df2ac, next= 001c5c3a, order=1962668
Aug 22 23:24:47 elf kernel: kfree of non-kmalloced memory: 001df7c0, next= 001c5c3a, order=1962668
Aug 22 23:24:47 elf kernel: idle task may not sleep
GDB/source:
void scsi_mark_host_reset(struct Scsi_Host *Host)
{
Scsi_Cmnd *SCpnt;
for (SCpnt = Host->host_queue; SCpnt; SCpnt = SCpnt->next)
scsi_mark_device_reset(SCpnt->device);
}
The Scsi_Host structure seems to be invalid or damaged; SCpnt is invalid!
(gdb) disass scsi_mark_host_reset
Dump of assembler code for function scsi_mark_host_reset:
0x19a428 <scsi_mark_host_reset>: movl 0x4(%esp,1),%eax
0x19a42c <scsi_mark_host_reset+4>: movl 0x10(%eax),%edx
0x19a42f <scsi_mark_host_reset+7>: testl %edx,%edx
0x19a431 <scsi_mark_host_reset+9>:
je 0x19a442 <scsi_mark_host_reset+26>
0x19a433 <scsi_mark_host_reset+11>: nop
0x19a434 <scsi_mark_host_reset+12>: movl 0x4(%edx),%eax
0x19a437 <scsi_mark_host_reset+15>: orb $0xc0,0x4b(%eax)
0x19a43b <scsi_mark_host_reset+19>: movl 0x10(%edx),%edx
0x19a43e <scsi_mark_host_reset+22>: testl %edx,%edx
0x19a440 <scsi_mark_host_reset+24>:
jne 0x19a434 <scsi_mark_host_reset+12>
0x19a442 <scsi_mark_host_reset+26>: ret
0x19a443 <scsi_mark_host_reset+27>: nop
End of assembler dump.
Anf finally another one with DEBUG enabled in scsi.c and sd.c:
Aug 22 23:57:52 elf kernel: SMalloc: 4096 0000a000
Aug 22 23:57:52 elf kernel: scsi_do_cmd (host = 0, channel = 0 target = 0, buffer =0000a000, bufflen = 4096, done = 001ab754, timeout = 1000, retries = 5)
Aug 22 23:57:52 elf kernel: command : 37 00 14 00 00 00 00 20 00 00
Aug 22 23:57:52 elf kernel: internal_cmnd (host = 0, channel = 0, target = 0, command = 0000b430, buffer = 0000a000,
Aug 22 23:57:52 elf kernel: bufflen = 4096, done = 001ab754)
Aug 22 23:57:52 elf kernel: queuecommand : routine at 001a9050
Aug 22 23:57:52 elf kernel: leaving internal_cmnd()
Aug 22 23:57:52 elf kernel: Leaving scsi_do_cmd()
Aug 22 23:57:52 elf kernel: (scsi0:0:0) Data overrun of 16773218 bytes detected in Data-In phase, tag 7; forcing a retry.
Aug 22 23:57:52 elf kernel: Have seen Data Phase. Length=4096, NumSGs=1.
Aug 22 23:57:52 elf kernel: sg[0] - Addr 0xa000 : Length 4096
Aug 22 23:57:52 elf kernel: In scsi_done(host = 0, result = 070000)
Aug 22 23:57:52 elf kernel: In MAYREDO, allowing 5 retries, have 0
Aug 22 23:57:52 elf kernel: internal_cmnd (host = 0, channel = 0, target = 0, command = 0000b430, buffer = 0000a000,
Aug 22 23:57:52 elf kernel: bufflen = 4096, done = 001ab754)
Aug 22 23:57:52 elf kernel: queuecommand : routine at 001a9050
Aug 22 23:57:52 elf kernel: leaving internal_cmnd()
Aug 22 23:57:52 elf kernel: (scsi0:0:0) Data overrun of 16773218 bytes detected in Data-In phase, tag 7; forcing a retry.
Aug 22 23:57:52 elf kernel: Have seen Data Phase. Length=4096, NumSGs=1.
Aug 22 23:57:52 elf kernel: sg[0] - Addr 0xa000 : Length 4096
Aug 22 23:57:52 elf kernel: In scsi_done(host = 0, result = 070000)
Aug 22 23:57:52 elf kernel: In MAYREDO, allowing 5 retries, have 1
Aug 22 23:57:52 elf kernel: scsi0 channel 0 : resetting for second half of retries.
Aug 22 23:57:52 elf kernel: SCSI bus is being reset for host 0 channel 0.
Aug 22 23:57:52 elf kernel: (scsi0:0:0) Reset called, no scb, flags 0x1
Aug 22 23:57:52 elf kernel: (scsi0:-1:0) Channel reset
Aug 22 23:57:52 elf kernel: scsi reset function returned 514
Aug 22 23:57:52 elf kernel: Unable to handle kernel paging request at virtual address c5e7024b
Aug 22 23:57:52 elf kernel: current->tss.cr3 = 00101000, Xr3 = 00101000
Aug 22 23:57:52 elf kernel: *pde = 00000000
Aug 22 23:57:52 elf kernel: Oops: 0002
Aug 22 23:57:52 elf kernel: CPU: 0
Aug 22 23:57:52 elf kernel: EIP: 0010:[scsi_mark_host_reset+15/28]
Aug 22 23:57:52 elf kernel: EFLAGS: 00010006
Aug 22 23:57:52 elf kernel: eax: 05e70200 ebx: 00000202 ecx: 001e1bf0 edx: 0000b018
Aug 22 23:57:52 elf kernel: esi: 00090018 edi: 0000b410 ebp: 00000001 esp: 001dfcac
Aug 22 23:57:52 elf kernel: ds: 0018 es: 0018 fs: 002b gs: 0018 ss: 0018
Aug 22 23:57:52 elf kernel: Process swapper (pid: 0, process nr: 0, stackpage=001dde44)
Aug 22 23:57:52 elf kernel: Stack: 0019a9df 00090018 00000000 0000b410 00000001 00000027 0019a381 0000b410
Aug 22 23:57:52 elf kernel: 00000001 001d2807 00000000 00000000 00000046 01b23408 00090068 0000e0f8
Aug 22 23:57:52 elf kernel: 01b23408 00000001 0000e0f8 00070000 00090018 001a6a75 0000b410 002831f8
Aug 22 23:57:52 elf kernel: Call Trace: [scsi_reset+415/792] [scsi_done+1389/2028] [aic7xxx_isr+1117/1424] [aic7xxx_isr+1000/1424] [do_IRQ+45/80] [IRQ11_interrupt+95/144] [schedule+384/652]
Aug 22 23:57:52 elf kernel: [sys_idle+92/112] [sys_idle+108/112] [system_call+85/128] [init+0/656] [start_kernel+429/440]
Aug 22 23:57:52 elf kernel: Code: 80 48 4b c0 8b 52 10 85 d2 75 f2 c3 90 8b 44 24 04 8b 4c 24
Aug 22 23:57:52 elf kernel: Aiee, killing interrupt handler
Aug 22 23:57:52 elf kernel: kfree of non-kmalloced memory: 001dfe8c, next= 00000000, order=0
Aug 22 23:57:52 elf kernel: kfree of non-kmalloced memory: 001dfe7c, next= 00000000, order=0
Aug 22 23:57:52 elf kernel: kfree of non-kmalloced memory: 001e0390, next= 00000000, order=0
Aug 22 23:57:52 elf kernel: idle task may not sleep
Aug 22 23:57:52 elf kernel: scsi_free 0000a000 4096
Aug 22 23:57:52 elf kernel: idle task may not sleep
Aug 22 23:57:52 elf last message repeated 3 times
Aug 22 23:58:10 elf kernel: SMalloc: 4096 0000a000
Aug 22 23:58:10 elf kernel: scsi_do_cmd (host = 0, channel = 0 target = 0, buffer =0000a000, bufflen = 4096, done = 001ab754, timeout = 1000, retries = 5)
Aug 22 23:58:10 elf kernel: command : 37 00 0c 00 00 00 00 20 00 00
Aug 22 23:58:10 elf kernel: internal_cmnd (host = 0, channel = 0, target = 0, command = 0000b430, buffer = 0000a000,
Aug 22 23:58:10 elf kernel: bufflen = 4096, done = 001ab754)
Aug 22 23:58:10 elf kernel: queuecommand : routine at 001a9050
Aug 22 23:58:10 elf kernel: leaving internal_cmnd()
Aug 22 23:58:10 elf kernel: Leaving scsi_do_cmd()
Aug 23 00:00:34 elf syslogd 1.3-0: restart.
The size of 4096 seems to be the single page originally stated. Despite
of the suggestion to increase the size of the buffer (maybe per
ioctl(): set timeout and buffer size), I have only the hint that
reading the defect list is not the fastest operation.
Ulrich
More information about the aic7xxx
mailing list