iur crash ? -- long
dave baukus
dbaukus at chiaro.com
Mon Aug 8 17:46:04 GMT 2005
I have hit the same iir crash again; that is
gdt->sc_gccbs[]->gc_ucmd referencing freed memory.
Can anyone tell me why iir_ioctl() is not broken in
the GDT_IOCTL_GENERAL case ?
The bug I see is that ucmd references the ioctl arguments which
are either on the stack or malloc()ed by ioctl().
The code queues the command to the driver for processing
at interrupt level. The code then checks a flag for completion;
if not complete it tsleeps(). When tsleep() returns the
code assumes the command is done and returns.
If there is a signal pending tsleep() never sleeps and
ioctl() frees the memory referenced by ucmd. Finally,
the iir interrupt handler runs and attempts to process the
command --- crash !
case GDT_IOCTL_GENERAL:
{
gdt_ucmd_t *ucmd;
struct gdt_softc *gdt;
int lock;
ucmd = (gdt_ucmd_t *)cmdarg;
gdt = gdt_minor2softc(ucmd->io_node);
if (gdt == NULL)
return (ENXIO);
lock = splcam();
TAILQ_INSERT_TAIL(&gdt->sc_ucmd_queue, ucmd, links);
ucmd->complete_flag = FALSE;
splx(lock);
gdt_next(gdt);
if (!ucmd->complete_flag)
(void) tsleep((void *)ucmd, PCATCH | PRIBIO, "iirucw", 0);
break;
}
dave baukus wrote:
> I have a crash on BSD4.10 w/ a heavily modified network stack, but
> the disk/scsi subsystem is ostensively unmodified.
>
> I'm reasonably certain that the crash is caused by
> iir_intr(void *arg) passing 0xcOdedead to bcopy() as a length.
> We have INVARIANTS enabled.
>
> The quick question is: has this been seen ?
>
> The long question goes like this:
> ----------------------
> Here's the stack trace:
> #0 dumpsys () at ../../kern/kern_shutdown.c:519
> #1 0xc01e7253 in boot (howto=0x100) at ../../kern/kern_shutdown.c:331
> #2 0xc01e76cb in panic (fmt=0xc04cb120 "vm_fault: fault on nofault
> entry, addr: %lx")
> at ../../kern/kern_shutdown.c:635
> #3 0xc03c810c in vm_fault (map=0xc059fa78, vaddr=0xf5315000,
> fault_type=0x1, fault_flags=0x0)
> at ../../vm/vm_fault.c:240
> #4 0xc041ca5e in trap_pfault (frame=0xf8225d90, usermode=0x0,
> eva=0xf5315000) at ../../i386/i386/trap.c:921
> #5 0xc041c5e0 in trap (frame={tf_fs = 0xc01e0010, tf_es = 0xffff0010,
> tf_ds = 0xcb3c0010, tf_edi = 0xcbaeb52e,
> tf_esi = 0xf5315000, tf_ebp = 0xf8225e0c, tf_isp = 0xf8225dbc,
> tf_ebx = 0xc0dedead, tf_edx = 0xf5222b20,
> tf_ecx = 0x3033ee73, tf_eax = 0xd67d652e, tf_trapno = 0xc, tf_err
> = 0x0, tf_eip = 0xc041b502, tf_cs = 0x8,
> tf_eflags = 0x10202, tf_esp = 0xcadf9000, tf_ss = 0xcb9f9000}) at
> ../../i386/i386/trap.c:500
> #6 0xc041b502 in generic_bcopy ()
> #7 0xc04252fd in intr_mux (arg=x) at ../../i386/isa/intr_machdep.c:609
> #8 0xc040e74e in vec9 ()
> #9 0xc01df10b in exit1 (p=0xf80c88a0, rv=0xf) at
> ../../kern/kern_exit.c:225
> #10 0xc01e9262 in sigexit (p=0xf80c88a0, sig=0xf) at
> ../../kern/kern_sig.c:1519
> #11 0xc01e8fa4 in postsig (sig=0xf) at ../../kern/kern_sig.c:1422
> #12 0xc041d254 in syscall2 (frame={tf_fs = 0xbfbf002f, tf_es =
> 0x8fe002f, tf_ds = 0xbfbf002f, tf_edi = 0xbfbff800,
> tf_esi = 0x8fe5540, tf_ebp = 0xbfbff800, tf_isp = 0xf8225fd4,
> tf_ebx = 0x64, tf_edx = 0xbfbff780,
> tf_ecx = 0xbfbff700, tf_eax = 0x4, tf_trapno = 0x7, tf_err = 0x2,
> tf_eip = 0x88fe6ef4, tf_cs = 0x1f,
> tf_eflags = 0x203, tf_esp = 0xbfbff604, tf_ss = 0x2f}) at
> ../../i386/i386/trap.c:177
>
> --------------------------------
> Since intr_mux() does not call generic_bcopy()/bcopy(), the stack frames
> must be a mangled.
> The frames 4 to 7 look like:
>
> 0xf8225d10: 0xf8225d40 0xc01eabd3 0xffffffff
> frame 4
> 0xf8225d44
> trap_pfault
> 0xf8225d20: 0xc041ca5e 0xc059fa78 0xf5315000 0x00000001
> 0xf8225d30: 0x00000000 0x0000000c 0xf80c88a0 0xf5315000
> frame 5 trap
> 0xf8225d40: 0xcb3cdf01 0xf8225d88 0xc041c5e0 0xf8225d90
> 0xf8225d50: 0x00000000 0xf5315000 0x006c0200 0xf5315000
> 0xf8225d60: 0xcbaeb52e 0x00000000 0xc0dedead 0x00000000
> 0xf8225d70: 0xcbae4e5a 0xf8225d80 0xc0421783 0xc04217e9
> calltrap
> 0xf8225d80: 0xf8225e0c 0xc040dcb4 0xf8225e0c 0xc040d0c0
> 0xf8225d90: 0xc01e0010 0xffff0010 0xcb3c0010 0xcbaeb52e
> 0xf8225da0: 0xf5315000 0xf8225e0c 0xf8225dbc 0xc0dedead
> 0xf8225db0: 0xf5222b20 0x3033ee73 0xd67d652e 0x0000000c
> 0xf8225dc0: 0x00000000 0xc041b502 0x00000008 0x00010202
> 0xf8225dd0: 0xcadf9000 0xcb9f9000 0xc01b7e11 0xf5222b20
> 0xf8225de0: 0xcb9f904e 0xc0dedead 0xc6048460 0x00400200
> 0xf8225df0: 0x00000000 0xf5222b20 0x006c0200 0x00000000
> 0xf8225e00: 0x00000000 0x00090001 0x006fc67b frame 7
> 0xf8225e24
> intr_mux
> 0xf8225e10: 0xc04252fd 0xcadf9000 0x006c0200 0x00000000
> 0xf8225e20: 0xc2ee216c 0xf8225e90 0xc040e1d7 0xc6048460
>
> ------------------------------------
> It's between frame 7 (intr_mux()) and frame 5 (trap()), that I begin
> guessing at the sequence of events.
>
> Based on the 0xcadf9000 at 0xf8225e14 I speculate that iir_intr was the
> last interrupt routine called.
> Here is the intrec * list passed to intr_mux()
>
> set $P=(intrec *)0xc6048460
> (kgdb) intrecwalk $P
> $186 = {mask = 0x6c0200, handler = 0xc01b7b90 <iir_intr>, argument =
> 0xcadf9000, next = 0xc60483e0,
> name = 0xcadf3b80 "iir0", intr = 0x9, maskptr = 0xc0552b54, flags = 0x0}
> $187 = {mask = 0x660200, handler = 0xc03ef350 <em_intr>, argument =
> 0xcadfc000, next = 0xc6048260,
> name = 0xcadf3a60 "em0", intr = 0x9, maskptr = 0xc0552b50, flags = 0x0}
> $188 = {mask = 0x660200, handler = 0xc03ef350 <em_intr>, argument =
> 0xcadfd000, next = 0xcae01e60,
> name = 0xcadf3950 "em1", intr = 0x9, maskptr = 0xc0552b50, flags = 0x0}
> $189 = {mask = 0x660200, handler = 0xc03ef350 <em_intr>, argument =
> 0xcadff000, next = 0xcae01ce0,
> name = 0xcadf3810 "em2", intr = 0x9, maskptr = 0xc0552b50, flags = 0x0}
> $190 = {mask = 0x660200, handler = 0xc03ef350 <em_intr>, argument =
> 0xcae03000, next = 0xcae01ba0,
> name = 0xcadf3700 "em3", intr = 0x9, maskptr = 0xc0552b50, flags = 0x0}
> $191 = {mask = 0x630212, handler = 0xc03b6570 <ics_intr>, argument =
> 0x0, next = 0xcae01b00, name = 0xcadf3620 "ics0",
> intr = 0x9, maskptr = 0xc0552b48, flags = 0x0}
> $192 = {mask = 0x660200, handler = 0xc03ef350 <em_intr>, argument =
> 0xcae04000, next = 0xcae01a00,
> name = 0xcadf3520 "em4", intr = 0x9, maskptr = 0xc0552b50, flags = 0x0}
> $193 = {mask = 0x660200, handler = 0xc03ef350 <em_intr>, argument =
> 0xcae06000, next = 0xcae01880,
> name = 0xcadf3410 "em5", intr = 0x9, maskptr = 0xc0552b50, flags = 0x0}
> $194 = {mask = 0x660200, handler = 0xc03ef350 <em_intr>, argument =
> 0xcae07000, next = 0xcae01760,
> name = 0xcadf3300 "em6", intr = 0x9, maskptr = 0xc0552b50, flags = 0x0}
> $195 = {mask = 0x68c640, handler = 0xc03d74b0 <uhci_intr>, argument =
> 0xcae09000, next = 0x0,
> name = 0xcadf32c0 "uhci0", intr = 0x9, maskptr = 0xc0552b4c, flags = 0x0}
>
> -----------------------------------------------
> Since I think its iir_intr that was called, I poke around in the stack
> frames
> between frame 7 and 5.
> At 0xf8225dd8 I see the value 0xc01b7e11
>
> (kgdb) x 0xc01b7e11
> 0xc01b7e11 <iir_intr+641>: 0x830cc483
>
> (kgdb) disass <iir_intr+641>:
> ...
> ...
> ...
> 0xc01b7dff <iir_intr+623>: test %ebx,%ebx
> 0xc01b7e01 <iir_intr+625>: je 0xc01b7e14 <iir_intr+644>
> 0xc01b7e03 <iir_intr+627>: push %ebx
> 0xc01b7e04 <iir_intr+628>: lea 0x4e(%esi),%eax
> 0xc01b7e07 <iir_intr+631>: push %eax
> 0xc01b7e08 <iir_intr+632>: mov 0xffffffe8(%ebp),%edx
> 0xc01b7e0b <iir_intr+635>: push %edx
> 0xc01b7e0c <iir_intr+636>: call 0xc041b4d8 <bcopy>
> 0xc01b7e11 <iir_intr+641>: add $0xc,%esp
> 0xc01b7e14 <iir_intr+644>: cmpl $0x0,0x42(%esi)
>
> -----------------------------------------------------
> Therefore, 0xcadf9000 is the struct gdt_softc * argument to iir_intr()
>
>
> (kgdb) set $SC=(struct gdt_softc *)0xcadf9000
> (kgdb) p *$SC
> $217 = {sc_hanum = 0x0, sc_class = 0x5, sc_bus = 0x4, sc_slot = 0x8,
> sc_device = 0x600, sc_subdevice = 0x1af,
> sc_fw_vers = 0x22a, sc_init_level = 0x6, sc_state = 0x0, sc_dev =
> 0xcadf4000, sc_dpmemt = 0x1,
> sc_dpmemh = 0xf31c7000, sc_dpmembase = 0xf8000000, sc_parent_dmat =
> 0xcadfad00, sc_buffer_dmat = 0xcadfacc0,
> sc_gccb_dmat = 0xcadfac80, sc_gccb_dmamap = 0x0, sc_gccb_busbase =
> 0x1d000, sc_gccbs = 0xf51c7000, sc_free_gccb = {
> slh_first = 0xf51e1860}, sc_pending_gccb = {slh_first = 0xf5211440},
> sc_ccb_queue = {tqh_first = 0x0,
> tqh_last = 0xcadf9050}, sc_ucmd_queue = {tqh_first = 0x0, tqh_last =
> 0xcadf9058}, sc_ic_all_size = 0x2fc0,
> sc_cmd_len = 0x24, sc_cmd_off = 0x24, sc_cmd_cnt = 0x1,
> sc_cmd =
> "\000\000\000\000d\000\000\000\002\000\000\000¿#\235\000\200\000\000\000ÿÿÿÿ\001\000\000\000\000\000\006â\000\000\001",
> '\000' <repeats 288 times>, sc_info = 0x0, sc_info2 = 0x0, sc_status =
> 0x1000, sc_service = 0x0,
> sc_bus_cnt = 0x3, sc_virt_bus = 0x2, sc_bus_id = "\a\a\000\000\000",
> sc_more_proc = 0x0, sc_hdr = {{
> hd_present = 0x1, hd_is_logdrv = 0x0, hd_is_arraydrv = 0x0,
> hd_is_master = 0x0, hd_is_parity = 0x0,
> hd_is_hotfix = 0x0, hd_master_no = 0x0, hd_lock = 0x0, hd_heads =
> 0xff, hd_secs = 0x3f, hd_devtype = 0x0,
> hd_size = 0x88efe6a, hd_ldr_no = 0x0, hd_rw_attribs = 0x0,
> hd_start_sec = 0x0}, {hd_present = 0x0,
> hd_is_logdrv = 0x0, hd_is_arraydrv = 0x0, hd_is_master = 0x0,
> hd_is_parity = 0x0, hd_is_hotfix = 0x0,
> hd_master_no = 0x0, hd_lock = 0x0, hd_heads = 0x0, hd_secs = 0x0,
> hd_devtype = 0x0, hd_size = 0x0,
> hd_ldr_no = 0x0, hd_rw_attribs = 0x0, hd_start_sec = 0x0} <repeats
> 99 times>}, sc_raw_feat = 0x1,
> sc_cache_feat = 0x101, sc_dvr = {size = 0x0, eu = {stream = '\000'
> <repeats 15 times>, driver = {ionode = 0x0,
> service = 0x0, index = 0x0}, async = {ionode = 0x0, service =
> 0x0, status = 0x0, info = 0x0,
> scsi_coord = "\000\000"}, sync = {ionode = 0x0, service = 0x0,
> status = 0x0, info = 0x0, hostdrive = 0x0,
> scsi_coord = "\000\000", sense_key = 0x0}, test = {l1 = 0x0, l2
> = 0x0, l3 = 0x0, l4 = 0x0}}, severity = 0x0,
> event_string = '\000' <repeats 255 times>}, sims = {0xcadfac00,
> 0xcadfab40, 0xcadfaa80, 0x0, 0x0, 0x0}, paths = {
> 0xcadf3c20, 0xcadf3bf0, 0xcadf3bc0, 0x0, 0x0, 0x0}, sc_copy_cmd =
> 0xc01b90d4 <gdt_mpr_copy_cmd>,
> sc_get_status = 0xc01b9190 <gdt_mpr_get_status>, sc_intr = 0xc01b91b4
> <gdt_mpr_intr>,
> sc_release_event = 0xc01b92d0 <gdt_mpr_release_event>, sc_set_sema0 =
> 0xc01b92f0 <gdt_mpr_set_sema0>,
> sc_test_busy = 0xc01b9310 <gdt_mpr_test_busy>, links = {tqe_next =
> 0x0, tqe_prev = 0xc04ffe80}}
>
> -----------------------------------------------------
> Now I try to figure which iir_intr() code path was executed.
> Only the case GDT_GCF_IOCTL: code path leads to a bcopy().
>
> I walked all the struct gdt_ccb * in gdt->sc_gccbs[],
> Only 1 has a non-zero gccb->gc_flags value; its
> value is 4 (GDT_GCF_IOCTL)
>
> (kgdb) set $SCBS=(struct gdt_ccb *)&$SC->sc_gccbs[121]
> (kgdb) p *$SCBS
> $218 = {gc_scratch = "\001\000\0013", '\000' <repeats 3067 times>,
> gc_ccb = 0xcb16c400, gc_ucmd = 0xcb9f9000,
> gc_dmamap = 0x0, gc_map_flag = 0x1, gc_timeout = 0x0, gc_state = 0x0,
> gc_service = 0x9, gc_cmd_index = 0x7b,
> gc_flags = 0x4, sle = {sle_next = 0x0}}
>
> ---------------------------------------
> Down to the bcopy():
> the bcopy() decission is made off of values in gc_ucmd,
> and nothing good can come from using most of these values:
>
> (kgdb) set $UCMD=(gdt_ucmd_t *)$SCBS->gc_ucmd
> (kgdb) p *$UCMD
> $219 = {io_node = 0xc0de, service = 0xdead, timeout = 0xc05076a0, status
> = 0x1, info = 0x0, BoardNode = 0xc0ded8b2,
> CommandIndex = 0xc0dedead, OpCode = 0xdead, u = {cache = {DeviceNo =
> 0xc0de, BlockNo = 0xc0dedead,
> BlockCnt = 0xc0dedead, DestAddr = 0xc0dedead}, ioctl = {param_size
> = 0xc0de, subfunc = 0xc0dedead,
> channel = 0xc0dedead, p_param = 0xc0dedead}, raw = {reserved =
> 0xc0de, direction = 0xc0dedead,
> mdisc_time = 0xc0dedead, mcon_time = 0xc0dedead, sdata =
> 0xc0dedead, sdlen = 0xc0dedead, clen = 0xc0dedead,
> cmd = "ÞÞÀÞÞÀÞÞÀ", target = 0xad, lun = 0xde, bus = 0x1,
> priority = 0x0, sense_len = 0x0, sense_data = 0x0,
> link_p = 0x10}}, data = "\001\000\0013", '\000' <repeats 3067
> times>, complete_flag = 0xcb16c400, links = {
> tqe_next = 0xcb9f9000, tqe_prev = 0x0}}
>
> ----------------------------
> Not knowing anything about iir/scsi, it appears to me
> that gdt->sc_gccbs[121]->gc_ucmd has been freed and yet is still
> referenced and
> in use.
>
> How is this ddt_ucmd_t * gc_ucmd data managed ?
> Is it actively malloc()ed and free()d ?
>
> Any clues or pointers will be appreciated.
>
>
--
Dave Baukus
dbaukus at chiaro.com
Chiaro Networks Ltd.
Richardson, Texas
USA
More information about the freebsd-scsi
mailing list