RFC: ATA to CAM integration patch (INTEL DX58SO)

Tue Jul 7 09:16:35 UTC 2009

On Mon, Jul 06, 2009 at 05:20:45PM -0400, Mike Tancsa wrote:
> At 02:41 AM 7/5/2009, Alexander Motin wrote:
> >>Jul  4 20:25:57 ich10 kernel: ahcich2: ahci_ch_intr ERROR is 
> >>40000001 cs 00000004 ss 00000000 rs 00000004 tfd 451 serr 00000000
> >
> >This is AHCI driver debugging. I've removed it in latest patch. In 
> >this case it means that drive signals some command error.
> 
> 
> Hi,
> 
> With the latest patch (cam-ata.20090704.patch), writing to the disk 
> with physical errors looks like this now
> 
> Jul  6 13:56:17 ich10 last message repeated 4 times
> Jul  6 13:56:17 ich10 kernel: 
> g_vfs_done():ada2[READ(offset=42003431424, length=16384)]error = 5
> Jul  6 13:56:17 ich10 kernel: ahcich2: Error while READ LOG EXT
> Jul  6 13:56:17 ich10 last message repeated 4 times
> Jul  6 13:56:17 ich10 kernel: 
> g_vfs_done():ada2[READ(offset=42196107264, length=16384)]error = 5
> Jul  6 13:56:17 ich10 kernel: ahcich2: Error while READ LOG EXT
> Jul  6 13:56:17 ich10 last message repeated 4 times
> Jul  6 13:56:17 ich10 kernel: 
> g_vfs_done():ada2[READ(offset=42388783104, length=16384)]error = 5
> Jul  6 13:56:17 ich10 kernel: ahcich2: Error while READ LOG EXT
> Jul  6 13:56:17 ich10 last message repeated 4 times
> Jul  6 13:56:17 ich10 kernel: 
> g_vfs_done():ada2[READ(offset=42581458944, length=16384)]error = 5
> Jul  6 13:56:17 ich10 kernel: ahcich2: Error while READ LOG EXT
> Jul  6 13:56:17 ich10 last message repeated 4 times
> Jul  6 13:56:17 ich10 kernel: 
> g_vfs_done():ada2[READ(offset=42774134784, length=16384)]error = 5
> Jul  6 13:56:17 ich10 kernel: ahcich2: Error while READ LOG EXT
> Jul  6 13:56:18 ich10 last message repeated 4 times
> Jul  6 13:56:18 ich10 kernel: 
> g_vfs_done():ada2[READ(offset=42966810624, length=16384)]error = 5
> Jul  6 13:56:18 ich10 kernel: ahcich2: Error while READ LOG EXT
> Jul  6 13:56:18 ich10 last message repeated 4 times
> 
> Still the box does a panic when writing to the disk that has bad 
> sectors on it. (I do newfs it between reboots).   Again, not sure if 
> this is a "well, dont use a bad disk", but here is the panic again in 
> case it shows something useful.
> 
> 
> Unread portion of the kernel message buffer:
> ahcich2: Error while READ LOG EXT
> ahcich2: Error while READ LOG EXT
> g_vfs_done():ada2s1d[READ(offset=36418928640, length=16384)]error = 5
> ahcich2: Error while READ LOG EXT
> ahcich2: Error while READ LOG EXT
> ahcich2: Error while READ LOG EXT
> panic: initiate_write_inodeblock_ufs2: already started
> cpuid = 6
> Uptime: 5m55s
> ahcich2: Error while READ LOG EXT
> (ada2:ahcich2:0:0:0): Synchronize cache failed
> Physical memory: 3556 MB
> Dumping 220 MB:
> 
> Fatal trap 12: page fault while in kernel mode
> cpuid = 4; apic id = 04
> fault virtual address   = 0xb0014
> fault code              = supervisor read, page not present
> instruction pointer     = 0x20:0xc047f14b
> stack pointer           = 0x28:0xc6e69c08
> frame pointer           = 0x28:0xc6e69c20
> code segment            = base 0x0, limit 0xfffff, type 0x1b
>                         = DPL 0, pres 1, def32 1, gran 1
> processor eflags        = interrupt enabled, resume, IOPL = 0
> current process         = 12 (irq257: ahci0)
> trap number             = 12
>  205 189 173 157 141 125 109 93 77 61 45 29 13
> 
> Reading symbols from /boot/kernel/ahci.ko...Reading symbols from 
> /boot/kernel/ahci.ko.symbols...done.
> done.
> Loaded symbols for /boot/kernel/ahci.ko
> #0  doadump () at pcpu.h:246
> 246     pcpu.h: No such file or directory.
>         in pcpu.h
> (kgdb) #0  doadump () at pcpu.h:246
> #1  0xc086ca3e in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:419
> #2  0xc086ccd9 in panic (fmt=Variable "fmt" is not available.
> ) at /usr/src/sys/kern/kern_shutdown.c:575
> #3  0xc0a87ebe in softdep_disk_io_initiation (bp=0xdb2bbf60)
>     at /usr/src/sys/ufs/ffs/ffs_softdep.c:4056
> #4  0xc0a8be2c in ffs_geom_strategy (bo=0xc803c2c0, bp=0xdb2bbf60)
>     at buf.h:404
> #5  0xc08e2019 in bufwrite (bp=0xdb2bbf60) at buf.h:397
> #6  0xc0a8b55b in ffs_bufwrite (bp=0xdb2bbf60)
>     at /usr/src/sys/ufs/ffs/ffs_vfsops.c:1893
> #7  0xc08df1c8 in vfs_bio_awrite (bp=0xdb2bbf60) at buf.h:385
> #8  0xc08e89bb in vop_stdfsync (ap=0xe7798c7c)
>     at /usr/src/sys/kern/vfs_default.c:608
> #9  0xc07f366c in devfs_fsync (ap=0xe7798c7c)
>     at /usr/src/sys/fs/devfs/devfs_vnops.c:556
> #10 0xc0b90095 in VOP_FSYNC_APV (vop=0xc0d1a580, a=0xe7798c7c)
>     at vnode_if.c:1267
> #11 0xc08f87a8 in sync_vnode (slp=0xc76a27f4, bo=0xe7798ce8, td=0xc78ad6c0)
>     at vnode_if.h:549
> #12 0xc08f8af3 in sched_sync () at /usr/src/sys/kern/vfs_subr.c:1799
> #13 0xc0844418 in fork_exit (callout=0xc08f8880 <sched_sync>, arg=0x0,
>     frame=0xe7798d38) at /usr/src/sys/kern/kern_fork.c:842
> #14 0xc0b67cb0 in fork_trampoline () at 
> /usr/src/sys/i386/i386/exception.s:270
> (kgdb)
> 
> 
> Unless you would like me to test some other features of the driver, I 
> will just RMA the drive tomorrow.
> 
It seems you are doing newfs and then mounting it. Why?
If you want to remove the data do something like
dd if=/dev/zero of=/dev/ada2 bs=1m
or
dd if=/dev/random of=/dev/ada2 bs=1m

You could also try smaller block sizes (bs argument) near the bad blocks.

Just 0.02$,
Alexey.