"read defect list" with 2.0.30-pre7 and patch Aug19
Ulrich Windl
ulrich.windl at rz.uni-regensburg.de
Thu Aug 21 00:17:59 PDT 1997
(My floppy with the Aug19-2 patch had a CRC error, so I had to use
the Aug19 patch)
The driver (compared to the stock version of 2.0.30) gave a new
warning about automatic termination being enabled. That is correct
for my AHA2940 with BIOS 1.21, and it really work, even though I
don't understand how.
Having enabled the statistics, I found out that I have statistics for
non-existing SCSI IDs and LUNs -- maybe the read was there, but not the
LUN ;-) The question is if you want to support SCSC plug and play,
what condition should you check? At least accesses in two categories?
Despite of that the information given should be much more compact; for
cat /proc/scsi/aic7xxx/0 I got a bunch of:
...possible overflow at loop 0:8
0:8
1:8
0:8
1:8
2:8
0:8
1:8
2:8
0:8
1:8
2:8
Resource allocation: SHouldn't the driver use a hardware-identifier instead
of a software-identifier when registering resources? Currently the driver
uses generic "aic7xxx", not the actual CHIP, and not the PCI bus & device.
With multiple cards the approach seems ambiguous (talking about /proc/ioports
and /proc/interrupts).
Unfortunately the kernel still bombs out badly, but I was able to get
at least some information onto a file on my IDE harddisk; I even had
symbolic information. I added another log to show how consistent the
fault is.
Still, as expected earlier, there seems to be a undetected buffer
overflow in the kernel that overwrites some SCSI data structures (at
least). The code of the fault looked OK, but the RAM accesses had
probably bad values.
I'll add Harald Koenig to the CC:, because he brought up the issue
with the overflow. Adding Hubert Mantel to the CC: because he is a great
fan of 2940 variants (not to talk about the driver...).
The good thing about the issue is that my SCSI harddisk is more valuable for
SCSI developers now than for the average users ;-)
Ulrich
Edited syslog (with shorter lines):
----------------------------------
22:02:50 restart.
22:03:08 klogd 1.3-0, log source = /proc/kmsg started.
22:03:08 Loaded 4129 symbols from /usr/src/linux/System.map.
22:03:08 Symbols match kernel version.
22:04:11 scsi0 channel 0 : resetting for second half of retries.
22:04:11 SCSI bus is being reset for host 0 channel 0.
22:04:11 Unable to handle kernel paging request at virtual address c5e7024b
22:04:11 current->tss.cr3 = 00101000, hr3 = 00101000
22:04:11 *pde = 00000000
22:04:11 Oops: 0002
22:04:11 CPU: 0
22:04:11 EIP: 0010:[scsi_mark_host_reset+15/28]
22:04:11 EFLAGS: 00010006
22:04:11 eax: 05e70200 ebx: 00000202 ecx: 0060cf24 edx: 00090018
22:04:11 esi: 00008018 edi: 00090410 ebp: 00000001 esp: 001dec98
22:04:11 ds: 0018 es: 0018 fs: 002b gs: 0018 ss: 0018
22:04:11 Process swapper (pid: 0, process nr: 0, stackpage=001dce04)
22:04:11 Stack: 0019a5fb 00008018 00000001 00090410 00000000 00000027 0019a02e 00090410
22:04:11 00000001 001d18dd 00000000 00000000 00000046 00089edc 00008068 00092058
22:04:11 00008068 00000001 00000000 00070000 00008018 001a6401 00090410 0009e1f8
22:04:11 Call Trace: [scsi_reset+399/776] [scsi_done+1162/1672] [aic7xxx_isr+1117/1424] [do_IRQ+45/80] [IRQ11_interrupt+95/144] [hard_idle+31/56] [sys_idle+59/112]
22:04:11 [system_call+85/128] [init+0/656] [start_kernel+429/440]
22:04:11 Code: 80 48 4b c0 8b 52 10 85 d2 75 f2 c3 90 8b 44 24 04 8b 4c 24
22:04:11 Aiee, killing interrupt handler
22:04:11 kfree of non-kmalloced memory: 001dee4c, next= 00000000, order=0
22:04:11 kfree of non-kmalloced memory: 001dee3c, next= 00000000, order=0
22:04:11 kfree of non-kmalloced memory: 001df350, next= 00000000, order=0
22:04:11 idle task may not sleep
22:04:11 elf last message repeated 4 times
00:30:33 restart.
00:33:24 klogd 1.3-0, log source = /proc/kmsg started.
00:33:24 Loaded 4129 symbols from /usr/src/linux/System.map.
00:33:24 Symbols match kernel version.
00:34:00 scsi0 channel 0 : resetting for second half of retries.
00:34:00 SCSI bus is being reset for host 0 channel 0.
00:34:00 Unable to handle kernel paging request at virtual address c5e7024b
00:34:00 current->tss.cr3 = 00101000, hr3 = 00101000
00:34:00 *pde = 00000000
00:34:00 Oops: 0002
00:34:00 CPU: 0
00:34:00 EIP: 0010:[scsi_mark_host_reset+15/28]
00:34:00 EFLAGS: 00010006
00:34:00 eax: 05e70200 ebx: 00000202 ecx: 00559f24 edx: 00090018
00:34:00 esi: 00008018 edi: 00090410 ebp: 00000001 esp: 001dec98
00:34:00 ds: 0018 es: 0018 fs: 002b gs: 0018 ss: 0018
00:34:00 Process swapper (pid: 0, process nr: 0, stackpage=001dce04)
00:34:00 Stack: 0019a5fb 00008018 00000001 00090410 00000000 00000027 0019a02e 00090410
00:34:00 00000001 001d18dd 00000000 00000000 00000046 00089de0 00008068 00092038
00:34:00 00008068 00000001 00000000 00070000 00008018 001a6401 00090410 0009e1f8
00:34:00 Call Trace: [scsi_reset+399/776] [scsi_done+1162/1672] [aic7xxx_isr+1117/1424] [do_IRQ+45/80] [IRQ11_interrupt+95/144] [hard_idle+31/56] [sys_idle+59/112]
00:34:00 [system_call+85/128] [init+0/656] [start_kernel+429/440]
00:34:00 Code: 80 48 4b c0 8b 52 10 85 d2 75 f2 c3 90 8b 44 24 04 8b 4c 24
00:34:00 Aiee, killing interrupt handler
00:34:00 kfree of non-kmalloced memory: 001dee4c, next= 00000000, order=0
00:34:00 kfree of non-kmalloced memory: 001dee3c, next= 00000000, order=0
00:34:00 kfree of non-kmalloced memory: 001df350, next= 00000000, order=0
00:34:00 idle task may not sleep
22:37:12 restart.
A gdb session:
-------------
(gdb) disass 0x0019a428
Dump of assembler code for function scsi_mark_host_reset:
0x19a428 <scsi_mark_host_reset>: movl 0x4(%esp,1),%eax
0x19a42c <scsi_mark_host_reset+4>: movl 0x10(%eax),%edx
0x19a42f <scsi_mark_host_reset+7>: testl %edx,%edx
0x19a431 <scsi_mark_host_reset+9>:
je 0x19a442 <scsi_mark_host_reset+26>
0x19a433 <scsi_mark_host_reset+11>: nop
0x19a434 <scsi_mark_host_reset+12>: movl 0x4(%edx),%eax
0x19a437 <scsi_mark_host_reset+15>: orb $0xc0,0x4b(%eax)
0x19a43b <scsi_mark_host_reset+19>: movl 0x10(%edx),%edx
0x19a43e <scsi_mark_host_reset+22>: testl %edx,%edx
0x19a440 <scsi_mark_host_reset+24>:
jne 0x19a434 <scsi_mark_host_reset+12>
0x19a442 <scsi_mark_host_reset+26>: ret
0x19a443 <scsi_mark_host_reset+27>: nop
End of assembler dump.
I suspect it's not the aic7xxx, I suspect someone else shot some memory with
an undetected overflow...
Kernel ends (quoting System.map) at: 0020fb7d A _end
And I did not configure SCSI generic support -- Should I?
More information about the aic7xxx
mailing list