"read defect list" with 2.0.30-pre7 and patch Aug19

Ulrich Windl ulrich.windl at rz.uni-regensburg.de
Sun Aug 24 23:26:37 PDT 1997


On 23 Aug 97 at 15:09, Doug Ledford wrote:

Doug,

thank you for analyzing that problem. In fact I also tried the Aug21 
patch without success, but with some more information. Probably you 
got that information yourself already.

The new SCSI statistics look much better.

To your patch: If changing things like below, you might also add a 
comment (inside the patch) saying why you changed it. It will help 
anybody outside this discussion group.

> --- scsi_ioctl.c        Mon Aug 11 21:00:52 1997
> +++ scsi_ioctl.c.new    Sat Aug 23 15:02:57 1997
> @@ -10,4 +10,5 @@
>  #include <asm/segment.h>
>  #include <asm/system.h>
> +#include <asm/page.h>
>  
>  #include <linux/errno.h>
> @@ -29,5 +30,5 @@
>  #define READ_ELEMENT_STATUS_TIMEOUT (5 * 60 * HZ)
>  
> -#define MAX_BUF 4096
> +#define MAX_BUF PAGE_SIZE
>  
>  #define max(a,b) (((a) > (b)) ? (a) : (b))
> @@ -206,6 +207,6 @@
>       * should use scsi_generics instead.
>       */
> -    if( inlen > MAX_BUF ) inlen = MAX_BUF;
> -    if( outlen > MAX_BUF ) outlen = MAX_BUF;
> +    if( inlen > MAX_BUF )  return -EINVAL;
> +    if( outlen > MAX_BUF )  return -EINVAL;
>  
>      cmd_in = (char *) ( ((int *)buffer) + 2);
> 

Here's my latest trace file:
-------------
Sad to say: Nothing new!

Driver output during boot:

<4>aic7xxx: <Adaptec AHA-294X SCSI host adapter> at PCI 12
<6>aic7xxx: Loading serial EEPROM...done
<6>aic7xxx: Warning - detected auto-termination.  Please verify driver
<6>         detected settings and use manual termination if necessary.
<6>aic7xxx: Cables present (Int-50 YES, Ext-50 NO)
<6>aic7xxx: eprom not present, brdctl_int=0xac, brdctl_ext=0x6c
<6>aic7xxx: Termination ON
<6>aic7xxx: BIOS enabled, IO Port 0xe000, IO Mem 0xf9800000, IRQ 11
<6>aic7xxx: Extended translation enabled.
<6>aic7xxx: Single Channel, SCSI ID 7, 16/255 SCBs, QFull 16, QMask 0x1f
<6>aic7xxx: Downloading sequencer code... 414 instructions downloaded
<4>scsi0 : Adaptec AHA274x/284x/294x (EISA/VLB/PCI-Fast SCSI) 4.1/3.2
<4>scsi : 1 host.
<6>scsi0: Scanning channel A for devices.
<4>scsi0: Target 0, channel A, now synchronous at 10.0MHz, offset 15.
<4>  Vendor: CONNER    Model: CFP1080S          Rev: 4649
<4>  Type:   Direct-Access                      ANSI SCSI revision: 02
<4>Detected scsi disk sda at scsi0, channel 0, id 0, lun 0
<4>scsi0: Target 1, channel A, now synchronous at 4.0MHz, offset 15.
<4>  Vendor: TOSHIBA   Model: CD-ROM XM-5301TA  Rev: 1895
<4>  Type:   CD-ROM                             ANSI SCSI revision: 02
<4>Detected scsi CD-ROM sr0 at scsi0, channel 0, id 1, lun 0
<4>scsi0: Target 2, channel A, now synchronous at 5.0MHz, offset 8.
<4>  Vendor: HP        Model: HP35480A          Rev: T603
<4>  Type:   Sequential-Access                  ANSI SCSI revision: 02
<6>Detected scsi tape st0 at scsi0, channel 0, id 2, lun 0
<6>(scsi0:0:0) Enabled tagged queuing, queue depth 8.
<4>scsi : detected 1 SCSI tape 1 SCSI cdrom 1 SCSI disk total.
<4>SCSI device sda: hdwr sector= 512 bytes. Sectors= 2110812 [1030 MB] [1.0 GB]
<4>Partition check:
<4> sda: sda1 < sda5 sda6 >
<4> hda: hda1 hda2 hda3
<4>VFS: Mounted root (ext2 filesystem) readonly.

Status file after some activity:

Adaptec AIC7xxx driver version: 4.1/3.2
Compile Options:
  AIC7XXX_RESET_DELAY    : 5
  AIC7XXX_TAGGED_QUEUEING: Enabled
  AIC7XXX_PAGE_ENABLE    : Enabled
  AIC7XXX_PROC_STATS     : Enabled

Adapter Configuration:
          SCSI Adapter: Adaptec AHA-294X SCSI host adapter
                        (AIC-787x chipset)
              Host Bus: Single
               Base IO: 0xe000
        Base IO Memory: 0xf9800000
                   IRQ: 11
                  SCBs: Used 8, HW 16, Page 255
            Interrupts: 4657
         Serial EEPROM: True
  Extended Translation: Enabled
        SCSI Bus Reset: Enabled
            Ultra SCSI: Disabled
     Target Disconnect: Enabled

Statistics:
CHAN#A (TGT 0 LUN 0):
nxfers 2806 (2430 read;376 written)
blks(512) rd=29207; blks(512) wr=2184
        < 512 512-1K   1-2K   2-4K   4-8K  8-16K 16-32K 32-64K 64-128K >128K
 Reads:     0      4      7    902    640    720    120     22     15      0 
Writes:     0      0      0    292     76      5      1      2      0      0 

CHAN#A (TGT 1 LUN 0):
nxfers 1774 (1774 read;0 written)
blks(512) rd=22320; blks(512) wr=0
        < 512 512-1K   1-2K   2-4K   4-8K  8-16K 16-32K 32-64K 64-128K >128K
 Reads:     0      0      0   1226    148     71    329      0      0      0 
Writes:     0      0      0      0      0      0      0      0      0      0 


Configuration Info:

CONFIG_MODULES=y
CONFIG_MODVERSIONS=y
CONFIG_NET=y
CONFIG_PCI=y
CONFIG_SYSVIPC=y
CONFIG_BINFMT_AOUT=y
CONFIG_BINFMT_ELF=y
CONFIG_KERNEL_ELF=y
CONFIG_M586=y
CONFIG_BLK_DEV_FD=y
CONFIG_BLK_DEV_IDE=y
CONFIG_BLK_DEV_TRITON=y
CONFIG_BLK_DEV_RAM=y
CONFIG_INET=y
CONFIG_IP_MULTICAST=y
CONFIG_IP_NOSR=y
CONFIG_SKB_LARGE=y
CONFIG_SCSI=y
CONFIG_BLK_DEV_SD=y
CONFIG_CHR_DEV_ST=y
CONFIG_BLK_DEV_SR=y
CONFIG_SCSI_MULTI_LUN=y
CONFIG_SCSI_CONSTANTS=y
CONFIG_SCSI_AIC7XXX=y
CONFIG_AIC7XXX_TAGGED_QUEUEING=y
CONFIG_AIC7XXX_PAGE_ENABLE=y
CONFIG_AIC7XXX_PROC_STATS=y
CONFIG_AIC7XXX_RESET_DELAY=5
CONFIG_NETDEVICES=y
CONFIG_DUMMY=y
CONFIG_MINIX_FS=y
CONFIG_EXT2_FS=y
CONFIG_FAT_FS=y
CONFIG_MSDOS_FS=y
CONFIG_VFAT_FS=y
CONFIG_UMSDOS_FS=y
CONFIG_PROC_FS=y
CONFIG_ISO9660_FS=y
CONFIG_HPFS_FS=y
CONFIG_SERIAL=y
CONFIG_PRINTER=y
CONFIG_MOUSE=y
CONFIG_PSMOUSE=y
CONFIG_RTC=y
CONFIG_SOUND=y


Crash #1:

Aug 22 22:51:00 elf kernel: (scsi0:0:0) Data overrun of 16773218 bytes detected in Data-In phase, tag 0; forcing a retry.
Aug 22 22:51:00 elf kernel: Have seen Data Phase. Length=4096, NumSGs=1.
Aug 22 22:51:00 elf kernel:      sg[0] - Addr 0xb000 : Length 4096
Aug 22 22:51:00 elf kernel: (scsi0:0:0) Data overrun of 16773218 bytes detected in Data-In phase, tag 0; forcing a retry.
Aug 22 22:51:00 elf kernel: Have seen Data Phase. Length=4096, NumSGs=1.
Aug 22 22:51:00 elf kernel:      sg[0] - Addr 0xb000 : Length 4096
Aug 22 22:51:00 elf kernel: scsi0 channel 0 : resetting for second half of retries.
Aug 22 22:51:00 elf kernel: SCSI bus is being reset for host 0 channel 0.
Aug 22 22:51:00 elf kernel: (scsi0:0:0) Reset called, no scb, flags 0x1
Aug 22 22:51:00 elf kernel: (scsi0:-1:0) Channel reset
Aug 22 22:51:00 elf kernel: Unable to handle kernel paging request at virtual address c5e7024b
Aug 22 22:51:00 elf kernel: current->tss.cr3 = 00101000, hr3 = 00101000
Aug 22 22:51:00 elf kernel: *pde = 00000000
Aug 22 22:51:00 elf kernel: Oops: 0002
Aug 22 22:51:00 elf kernel: CPU:    0
Aug 22 22:51:00 elf kernel: EIP:    0010:[scsi_mark_host_reset+15/28]
Aug 22 22:51:00 elf kernel: EFLAGS: 00010006
Aug 22 22:51:00 elf kernel: eax: 05e70200   ebx: 00000202   ecx: 01ab5f24   edx: 0000c018
Aug 22 22:51:00 elf kernel: esi: 00090018   edi: 0000c410   ebp: 00000001   esp: 001df0dc
Aug 22 22:51:00 elf kernel: ds: 0018   es: 0018   fs: 002b   gs: 0018   ss: 0018
Aug 22 22:51:00 elf kernel: Process swapper (pid: 0, process nr: 0, stackpage=001dd274)
Aug 22 22:51:00 elf kernel: Stack: 0019a5fb 00090018 00000001 0000c410 00000000 00000027 0019a02e 0000c410 
Aug 22 22:51:00 elf kernel:        00000001 001d1dfd 00000000 00000000 00000046 0000d018 00090068 0000e018 
Aug 22 22:51:00 elf kernel:        0000d018 00000001 0000e018 00070000 00090018 001a6621 0000c410 0009e1f8 
Aug 22 22:51:00 elf kernel: Call Trace: [scsi_reset+399/776] [scsi_done+1162/1672] [aic7xxx_isr+1117/1424] [aic7xxx_isr+828/1424] [do_IRQ+45/80] [IRQ11_interrupt+95/144] [schedule+420/652] 
Aug 22 22:51:00 elf kernel:        [sys_idle+92/112] [sys_idle+108/112] [system_call+85/128] [init+0/656] [start_kernel+429/440] [it_real_fn+0/72] [schedule+564/652] 
Aug 22 22:51:00 elf kernel: Code: 80 48 4b c0 8b 52 10 85 d2 75 f2 c3 90 8b 44 24 04 8b 4c 24 
Aug 22 22:51:00 elf kernel: Aiee, killing interrupt handler
Aug 22 22:51:00 elf kernel: kfree of non-kmalloced memory: 001df2bc, next= 001f3bef, order=1128997
Aug 22 22:51:00 elf kernel: kfree of non-kmalloced memory: 001df2ac, next= 001c5c3a, order=1962668
Aug 22 22:51:00 elf kernel: kfree of non-kmalloced memory: 001df7c0, next= 001c5c3a, order=1962668
Aug 22 22:51:00 elf kernel: idle task may not sleep

Crash #2:

Aug 22 23:24:46 elf kernel: (scsi0:0:0) Data overrun of 16773218 bytes detected in Data-In phase, tag 6; forcing a retry.
Aug 22 23:24:46 elf kernel: Have seen Data Phase. Length=4096, NumSGs=1.
Aug 22 23:24:46 elf kernel:      sg[0] - Addr 0xb000 : Length 4096
Aug 22 23:24:47 elf kernel: (scsi0:0:0) Data overrun of 16773218 bytes detected in Data-In phase, tag 6; forcing a retry.
Aug 22 23:24:47 elf kernel: Have seen Data Phase. Length=4096, NumSGs=1.
Aug 22 23:24:47 elf kernel:      sg[0] - Addr 0xb000 : Length 4096
Aug 22 23:24:47 elf kernel: scsi0 channel 0 : resetting for second half of retries.
Aug 22 23:24:47 elf kernel: SCSI bus is being reset for host 0 channel 0.
Aug 22 23:24:47 elf kernel: (scsi0:0:0) Reset called, no scb, flags 0x1
Aug 22 23:24:47 elf kernel: (scsi0:-1:0) Channel reset
Aug 22 23:24:47 elf kernel: Unable to handle kernel paging request at virtual address c5e7024b
Aug 22 23:24:47 elf kernel: current->tss.cr3 = 00101000, Xr3 = 00101000
Aug 22 23:24:47 elf kernel: *pde = 00000000
Aug 22 23:24:47 elf kernel: Oops: 0002
Aug 22 23:24:47 elf kernel: CPU:    0
Aug 22 23:24:47 elf kernel: EIP:    0010:[scsi_mark_host_reset+15/28]
Aug 22 23:24:47 elf kernel: EFLAGS: 00010006
Aug 22 23:24:47 elf kernel: eax: 05e70200   ebx: 00000202   ecx: 0075af24   edx: 0000c018
Aug 22 23:24:47 elf kernel: esi: 00090018   edi: 0000c410   ebp: 00000001   esp: 001df0dc
Aug 22 23:24:47 elf kernel: ds: 0018   es: 0018   fs: 002b   gs: 0018   ss: 0018
Aug 22 23:24:47 elf kernel: Process swapper (pid: 0, process nr: 0, stackpage=001dd274)
Aug 22 23:24:47 elf kernel: Stack: 0019a5fb 00090018 00000001 0000c410 00000000 00000027 0019a02e 0000c410 
Aug 22 23:24:47 elf kernel:        00000001 001d1dfd 00000000 00000000 00000046 0177130c 00090068 0000e0d8 
Aug 22 23:24:47 elf kernel:        0177130c 00000001 0000e0d8 00070000 00090018 001a6621 0000c410 0009e1f8 
Aug 22 23:24:47 elf kernel: Call Trace: [scsi_reset+399/776] [scsi_done+1162/1672] [aic7xxx_isr+1117/1424] [add_timer_randomness+132/380] [aic7xxx_isr+828/1424] [do_IRQ+45/80] [IRQ11_interrupt+95/144
] 
Aug 22 23:24:47 elf kernel:        [schedule+552/652] [sys_idle+92/112] [sys_idle+108/112] [system_call+85/128] [init+0/656] [start_kernel+429/440] [it_real_fn+0/72] [schedule+564/652] 
Aug 22 23:24:47 elf kernel: Code: 80 48 4b c0 8b 52 10 85 d2 75 f2 c3 90 8b 44 24 04 8b 4c 24 
Aug 22 23:24:47 elf kernel: Aiee, killing interrupt handler
Aug 22 23:24:47 elf kernel: kfree of non-kmalloced memory: 001df2bc, next= 001f3bef, order=1128997
Aug 22 23:24:47 elf kernel: kfree of non-kmalloced memory: 001df2ac, next= 001c5c3a, order=1962668
Aug 22 23:24:47 elf kernel: kfree of non-kmalloced memory: 001df7c0, next= 001c5c3a, order=1962668
Aug 22 23:24:47 elf kernel: idle task may not sleep

GDB/source:

void scsi_mark_host_reset(struct Scsi_Host *Host)
{
  Scsi_Cmnd *SCpnt;
  for (SCpnt = Host->host_queue; SCpnt; SCpnt = SCpnt->next)
      scsi_mark_device_reset(SCpnt->device);
}

The Scsi_Host structure seems to be invalid or damaged; SCpnt is invalid!

(gdb) disass scsi_mark_host_reset
Dump of assembler code for function scsi_mark_host_reset:
0x19a428 <scsi_mark_host_reset>:	movl   0x4(%esp,1),%eax
0x19a42c <scsi_mark_host_reset+4>:	movl   0x10(%eax),%edx
0x19a42f <scsi_mark_host_reset+7>:	testl  %edx,%edx
0x19a431 <scsi_mark_host_reset+9>:	
    je     0x19a442 <scsi_mark_host_reset+26>
0x19a433 <scsi_mark_host_reset+11>:	nop    
0x19a434 <scsi_mark_host_reset+12>:	movl   0x4(%edx),%eax
0x19a437 <scsi_mark_host_reset+15>:	orb    $0xc0,0x4b(%eax)
0x19a43b <scsi_mark_host_reset+19>:	movl   0x10(%edx),%edx
0x19a43e <scsi_mark_host_reset+22>:	testl  %edx,%edx
0x19a440 <scsi_mark_host_reset+24>:	
    jne    0x19a434 <scsi_mark_host_reset+12>
0x19a442 <scsi_mark_host_reset+26>:	ret    
0x19a443 <scsi_mark_host_reset+27>:	nop    
End of assembler dump.

Anf finally another one with DEBUG enabled in scsi.c and sd.c:

Aug 22 23:57:52 elf kernel:  SMalloc: 4096 0000a000
Aug 22 23:57:52 elf kernel: scsi_do_cmd (host = 0, channel = 0 target = 0, buffer =0000a000, bufflen = 4096, done = 001ab754, timeout = 1000, retries = 5)
Aug 22 23:57:52 elf kernel: command : 37  00  14  00  00  00  00  20  00  00  
Aug 22 23:57:52 elf kernel: internal_cmnd (host = 0, channel = 0, target = 0, command = 0000b430, buffer = 0000a000, 
Aug 22 23:57:52 elf kernel: bufflen = 4096, done = 001ab754)
Aug 22 23:57:52 elf kernel: queuecommand : routine at 001a9050
Aug 22 23:57:52 elf kernel: leaving internal_cmnd()
Aug 22 23:57:52 elf kernel: Leaving scsi_do_cmd()
Aug 22 23:57:52 elf kernel: (scsi0:0:0) Data overrun of 16773218 bytes detected in Data-In phase, tag 7; forcing a retry.
Aug 22 23:57:52 elf kernel: Have seen Data Phase. Length=4096, NumSGs=1.
Aug 22 23:57:52 elf kernel:      sg[0] - Addr 0xa000 : Length 4096
Aug 22 23:57:52 elf kernel: In scsi_done(host = 0, result = 070000)
Aug 22 23:57:52 elf kernel: In MAYREDO, allowing 5 retries, have 0
Aug 22 23:57:52 elf kernel: internal_cmnd (host = 0, channel = 0, target = 0, command = 0000b430, buffer = 0000a000, 
Aug 22 23:57:52 elf kernel: bufflen = 4096, done = 001ab754)
Aug 22 23:57:52 elf kernel: queuecommand : routine at 001a9050
Aug 22 23:57:52 elf kernel: leaving internal_cmnd()
Aug 22 23:57:52 elf kernel: (scsi0:0:0) Data overrun of 16773218 bytes detected in Data-In phase, tag 7; forcing a retry.
Aug 22 23:57:52 elf kernel: Have seen Data Phase. Length=4096, NumSGs=1.
Aug 22 23:57:52 elf kernel:      sg[0] - Addr 0xa000 : Length 4096
Aug 22 23:57:52 elf kernel: In scsi_done(host = 0, result = 070000)
Aug 22 23:57:52 elf kernel: In MAYREDO, allowing 5 retries, have 1
Aug 22 23:57:52 elf kernel: scsi0 channel 0 : resetting for second half of retries.
Aug 22 23:57:52 elf kernel: SCSI bus is being reset for host 0 channel 0.
Aug 22 23:57:52 elf kernel: (scsi0:0:0) Reset called, no scb, flags 0x1
Aug 22 23:57:52 elf kernel: (scsi0:-1:0) Channel reset
Aug 22 23:57:52 elf kernel: scsi reset function returned 514
Aug 22 23:57:52 elf kernel: Unable to handle kernel paging request at virtual address c5e7024b
Aug 22 23:57:52 elf kernel: current->tss.cr3 = 00101000, Xr3 = 00101000
Aug 22 23:57:52 elf kernel: *pde = 00000000
Aug 22 23:57:52 elf kernel: Oops: 0002
Aug 22 23:57:52 elf kernel: CPU:    0
Aug 22 23:57:52 elf kernel: EIP:    0010:[scsi_mark_host_reset+15/28]
Aug 22 23:57:52 elf kernel: EFLAGS: 00010006
Aug 22 23:57:52 elf kernel: eax: 05e70200   ebx: 00000202   ecx: 001e1bf0   edx: 0000b018
Aug 22 23:57:52 elf kernel: esi: 00090018   edi: 0000b410   ebp: 00000001   esp: 001dfcac
Aug 22 23:57:52 elf kernel: ds: 0018   es: 0018   fs: 002b   gs: 0018   ss: 0018
Aug 22 23:57:52 elf kernel: Process swapper (pid: 0, process nr: 0, stackpage=001dde44)
Aug 22 23:57:52 elf kernel: Stack: 0019a9df 00090018 00000000 0000b410 00000001 00000027 0019a381 0000b410 
Aug 22 23:57:52 elf kernel:        00000001 001d2807 00000000 00000000 00000046 01b23408 00090068 0000e0f8 
Aug 22 23:57:52 elf kernel:        01b23408 00000001 0000e0f8 00070000 00090018 001a6a75 0000b410 002831f8 
Aug 22 23:57:52 elf kernel: Call Trace: [scsi_reset+415/792] [scsi_done+1389/2028] [aic7xxx_isr+1117/1424] [aic7xxx_isr+1000/1424] [do_IRQ+45/80] [IRQ11_interrupt+95/144] [schedule+384/652] 
Aug 22 23:57:52 elf kernel:        [sys_idle+92/112] [sys_idle+108/112] [system_call+85/128] [init+0/656] [start_kernel+429/440] 
Aug 22 23:57:52 elf kernel: Code: 80 48 4b c0 8b 52 10 85 d2 75 f2 c3 90 8b 44 24 04 8b 4c 24 
Aug 22 23:57:52 elf kernel: Aiee, killing interrupt handler
Aug 22 23:57:52 elf kernel: kfree of non-kmalloced memory: 001dfe8c, next= 00000000, order=0
Aug 22 23:57:52 elf kernel: kfree of non-kmalloced memory: 001dfe7c, next= 00000000, order=0
Aug 22 23:57:52 elf kernel: kfree of non-kmalloced memory: 001e0390, next= 00000000, order=0
Aug 22 23:57:52 elf kernel: idle task may not sleep
Aug 22 23:57:52 elf kernel: scsi_free 0000a000 4096
Aug 22 23:57:52 elf kernel: idle task may not sleep
Aug 22 23:57:52 elf last message repeated 3 times
Aug 22 23:58:10 elf kernel: SMalloc: 4096 0000a000
Aug 22 23:58:10 elf kernel: scsi_do_cmd (host = 0, channel = 0 target = 0, buffer =0000a000, bufflen = 4096, done = 001ab754, timeout = 1000, retries = 5)
Aug 22 23:58:10 elf kernel: command : 37  00  0c  00  00  00  00  20  00  00  
Aug 22 23:58:10 elf kernel: internal_cmnd (host = 0, channel = 0, target = 0, command = 0000b430, buffer = 0000a000, 
Aug 22 23:58:10 elf kernel: bufflen = 4096, done = 001ab754)
Aug 22 23:58:10 elf kernel: queuecommand : routine at 001a9050
Aug 22 23:58:10 elf kernel: leaving internal_cmnd()
Aug 22 23:58:10 elf kernel: Leaving scsi_do_cmd()
Aug 23 00:00:34 elf syslogd 1.3-0: restart.

The size of 4096 seems to be the single page originally stated. Despite
of the suggestion to increase the size of the buffer (maybe per
ioctl(): set timeout and buffer size), I have only the hint that
reading the defect list is not the fastest operation.


Ulrich



More information about the aic7xxx mailing list