iur crash ? -- long

dave baukus dbaukus at chiaro.com
Mon Aug 8 17:46:04 GMT 2005


I have hit the same iir crash again; that is
gdt->sc_gccbs[]->gc_ucmd referencing freed memory.

Can anyone tell me why iir_ioctl() is not broken in
the GDT_IOCTL_GENERAL case ?

The bug I see is that ucmd references the ioctl arguments which
are either on the stack or malloc()ed by ioctl().
The code queues the command to the driver for processing
at interrupt level. The code then checks a flag for completion;
if not complete it tsleeps(). When tsleep() returns the
code assumes the command is done and returns.
If there is a signal pending tsleep() never sleeps and
ioctl() frees the memory referenced by ucmd. Finally,
the iir interrupt handler runs and attempts to process the
command --- crash !



       case GDT_IOCTL_GENERAL:
         {
             gdt_ucmd_t *ucmd;
             struct gdt_softc *gdt;
             int lock;

             ucmd = (gdt_ucmd_t *)cmdarg;
             gdt = gdt_minor2softc(ucmd->io_node);
             if (gdt == NULL)
                 return (ENXIO);
             lock = splcam();
             TAILQ_INSERT_TAIL(&gdt->sc_ucmd_queue, ucmd, links);
             ucmd->complete_flag = FALSE;
             splx(lock);
             gdt_next(gdt);
             if (!ucmd->complete_flag)
                 (void) tsleep((void *)ucmd, PCATCH | PRIBIO, "iirucw", 0);
             break;
         }


dave baukus wrote:
> I have a crash on BSD4.10 w/ a heavily modified network stack, but
> the disk/scsi subsystem is ostensively unmodified.
> 
> I'm reasonably certain that the crash is caused by
> iir_intr(void *arg) passing 0xcOdedead to bcopy() as a length.
> We have INVARIANTS enabled.
> 
> The quick question is: has this been seen ?
> 
> The long question goes like this:
> ----------------------
> Here's the stack trace:
> #0  dumpsys () at ../../kern/kern_shutdown.c:519
> #1  0xc01e7253 in boot (howto=0x100) at ../../kern/kern_shutdown.c:331
> #2  0xc01e76cb in panic (fmt=0xc04cb120 "vm_fault: fault on nofault 
> entry, addr: %lx")
>     at ../../kern/kern_shutdown.c:635
> #3  0xc03c810c in vm_fault (map=0xc059fa78, vaddr=0xf5315000, 
> fault_type=0x1, fault_flags=0x0)
>     at ../../vm/vm_fault.c:240
> #4  0xc041ca5e in trap_pfault (frame=0xf8225d90, usermode=0x0, 
> eva=0xf5315000) at ../../i386/i386/trap.c:921
> #5  0xc041c5e0 in trap (frame={tf_fs = 0xc01e0010, tf_es = 0xffff0010, 
> tf_ds = 0xcb3c0010, tf_edi = 0xcbaeb52e,
>       tf_esi = 0xf5315000, tf_ebp = 0xf8225e0c, tf_isp = 0xf8225dbc, 
> tf_ebx = 0xc0dedead, tf_edx = 0xf5222b20,
>       tf_ecx = 0x3033ee73, tf_eax = 0xd67d652e, tf_trapno = 0xc, tf_err 
> = 0x0, tf_eip = 0xc041b502, tf_cs = 0x8,
>       tf_eflags = 0x10202, tf_esp = 0xcadf9000, tf_ss = 0xcb9f9000}) at 
> ../../i386/i386/trap.c:500
> #6  0xc041b502 in generic_bcopy ()
> #7  0xc04252fd in intr_mux (arg=x) at ../../i386/isa/intr_machdep.c:609
> #8  0xc040e74e in vec9 ()
> #9  0xc01df10b in exit1 (p=0xf80c88a0, rv=0xf) at 
> ../../kern/kern_exit.c:225
> #10 0xc01e9262 in sigexit (p=0xf80c88a0, sig=0xf) at 
> ../../kern/kern_sig.c:1519
> #11 0xc01e8fa4 in postsig (sig=0xf) at ../../kern/kern_sig.c:1422
> #12 0xc041d254 in syscall2 (frame={tf_fs = 0xbfbf002f, tf_es = 
> 0x8fe002f, tf_ds = 0xbfbf002f, tf_edi = 0xbfbff800,
>       tf_esi = 0x8fe5540, tf_ebp = 0xbfbff800, tf_isp = 0xf8225fd4, 
> tf_ebx = 0x64, tf_edx = 0xbfbff780,
>       tf_ecx = 0xbfbff700, tf_eax = 0x4, tf_trapno = 0x7, tf_err = 0x2, 
> tf_eip = 0x88fe6ef4, tf_cs = 0x1f,
>       tf_eflags = 0x203, tf_esp = 0xbfbff604, tf_ss = 0x2f}) at 
> ../../i386/i386/trap.c:177
> 
> --------------------------------
> Since intr_mux() does not call generic_bcopy()/bcopy(), the stack frames 
> must be a mangled.
> The frames 4 to 7 look like:
> 
> 0xf8225d10:     0xf8225d40      0xc01eabd3      0xffffffff
>                                                                 frame 4
>                                                                 0xf8225d44
>                 trap_pfault
> 0xf8225d20:     0xc041ca5e      0xc059fa78      0xf5315000      0x00000001
> 0xf8225d30:     0x00000000      0x0000000c      0xf80c88a0      0xf5315000
>                                 frame 5         trap
> 0xf8225d40:     0xcb3cdf01      0xf8225d88      0xc041c5e0      0xf8225d90
> 0xf8225d50:     0x00000000      0xf5315000      0x006c0200      0xf5315000
> 0xf8225d60:     0xcbaeb52e      0x00000000      0xc0dedead      0x00000000
> 0xf8225d70:     0xcbae4e5a      0xf8225d80      0xc0421783      0xc04217e9
>                                                                 calltrap
> 0xf8225d80:     0xf8225e0c      0xc040dcb4      0xf8225e0c      0xc040d0c0
> 0xf8225d90:     0xc01e0010      0xffff0010      0xcb3c0010      0xcbaeb52e
> 0xf8225da0:     0xf5315000      0xf8225e0c      0xf8225dbc      0xc0dedead
> 0xf8225db0:     0xf5222b20      0x3033ee73      0xd67d652e      0x0000000c
> 0xf8225dc0:     0x00000000      0xc041b502      0x00000008      0x00010202
> 0xf8225dd0:     0xcadf9000      0xcb9f9000      0xc01b7e11      0xf5222b20
> 0xf8225de0:     0xcb9f904e      0xc0dedead      0xc6048460      0x00400200
> 0xf8225df0:     0x00000000      0xf5222b20      0x006c0200      0x00000000
> 0xf8225e00:     0x00000000      0x00090001      0x006fc67b      frame 7
>                                                                 0xf8225e24
>                 intr_mux
> 0xf8225e10:     0xc04252fd      0xcadf9000      0x006c0200      0x00000000
> 0xf8225e20:     0xc2ee216c      0xf8225e90      0xc040e1d7      0xc6048460
> 
> ------------------------------------
> It's between frame 7 (intr_mux()) and  frame 5 (trap()), that I begin
> guessing at the sequence of events.
> 
> Based on the 0xcadf9000 at 0xf8225e14 I speculate that iir_intr was the
> last interrupt routine called.
> Here is the intrec * list passed to intr_mux()
> 
> set $P=(intrec *)0xc6048460
> (kgdb) intrecwalk $P
> $186 = {mask = 0x6c0200, handler = 0xc01b7b90 <iir_intr>, argument = 
> 0xcadf9000, next = 0xc60483e0,
>   name = 0xcadf3b80 "iir0", intr = 0x9, maskptr = 0xc0552b54, flags = 0x0}
> $187 = {mask = 0x660200, handler = 0xc03ef350 <em_intr>, argument = 
> 0xcadfc000, next = 0xc6048260,
>   name = 0xcadf3a60 "em0", intr = 0x9, maskptr = 0xc0552b50, flags = 0x0}
> $188 = {mask = 0x660200, handler = 0xc03ef350 <em_intr>, argument = 
> 0xcadfd000, next = 0xcae01e60,
>   name = 0xcadf3950 "em1", intr = 0x9, maskptr = 0xc0552b50, flags = 0x0}
> $189 = {mask = 0x660200, handler = 0xc03ef350 <em_intr>, argument = 
> 0xcadff000, next = 0xcae01ce0,
>   name = 0xcadf3810 "em2", intr = 0x9, maskptr = 0xc0552b50, flags = 0x0}
> $190 = {mask = 0x660200, handler = 0xc03ef350 <em_intr>, argument = 
> 0xcae03000, next = 0xcae01ba0,
>   name = 0xcadf3700 "em3", intr = 0x9, maskptr = 0xc0552b50, flags = 0x0}
> $191 = {mask = 0x630212, handler = 0xc03b6570 <ics_intr>, argument = 
> 0x0, next = 0xcae01b00, name = 0xcadf3620 "ics0",
>   intr = 0x9, maskptr = 0xc0552b48, flags = 0x0}
> $192 = {mask = 0x660200, handler = 0xc03ef350 <em_intr>, argument = 
> 0xcae04000, next = 0xcae01a00,
>   name = 0xcadf3520 "em4", intr = 0x9, maskptr = 0xc0552b50, flags = 0x0}
> $193 = {mask = 0x660200, handler = 0xc03ef350 <em_intr>, argument = 
> 0xcae06000, next = 0xcae01880,
>   name = 0xcadf3410 "em5", intr = 0x9, maskptr = 0xc0552b50, flags = 0x0}
> $194 = {mask = 0x660200, handler = 0xc03ef350 <em_intr>, argument = 
> 0xcae07000, next = 0xcae01760,
>   name = 0xcadf3300 "em6", intr = 0x9, maskptr = 0xc0552b50, flags = 0x0}
> $195 = {mask = 0x68c640, handler = 0xc03d74b0 <uhci_intr>, argument = 
> 0xcae09000, next = 0x0,
>   name = 0xcadf32c0 "uhci0", intr = 0x9, maskptr = 0xc0552b4c, flags = 0x0}
> 
> -----------------------------------------------
> Since I think its iir_intr that was called, I poke around in the stack 
> frames
> between frame 7 and 5.
> At 0xf8225dd8 I see the value 0xc01b7e11
> 
> (kgdb) x 0xc01b7e11
> 0xc01b7e11 <iir_intr+641>:      0x830cc483
> 
> (kgdb) disass <iir_intr+641>:
> ...
> ...
> ...
> 0xc01b7dff <iir_intr+623>:      test   %ebx,%ebx
> 0xc01b7e01 <iir_intr+625>:      je     0xc01b7e14 <iir_intr+644>
> 0xc01b7e03 <iir_intr+627>:      push   %ebx
> 0xc01b7e04 <iir_intr+628>:      lea    0x4e(%esi),%eax
> 0xc01b7e07 <iir_intr+631>:      push   %eax
> 0xc01b7e08 <iir_intr+632>:      mov    0xffffffe8(%ebp),%edx
> 0xc01b7e0b <iir_intr+635>:      push   %edx
> 0xc01b7e0c <iir_intr+636>:      call   0xc041b4d8 <bcopy>
> 0xc01b7e11 <iir_intr+641>:      add    $0xc,%esp
> 0xc01b7e14 <iir_intr+644>:      cmpl   $0x0,0x42(%esi)
> 
> -----------------------------------------------------
> Therefore, 0xcadf9000 is the struct gdt_softc * argument to iir_intr()
> 
> 
> (kgdb) set $SC=(struct gdt_softc *)0xcadf9000
> (kgdb) p *$SC
> $217 = {sc_hanum = 0x0, sc_class = 0x5, sc_bus = 0x4, sc_slot = 0x8, 
> sc_device = 0x600, sc_subdevice = 0x1af,
>   sc_fw_vers = 0x22a, sc_init_level = 0x6, sc_state = 0x0, sc_dev = 
> 0xcadf4000, sc_dpmemt = 0x1,
>   sc_dpmemh = 0xf31c7000, sc_dpmembase = 0xf8000000, sc_parent_dmat = 
> 0xcadfad00, sc_buffer_dmat = 0xcadfacc0,
>   sc_gccb_dmat = 0xcadfac80, sc_gccb_dmamap = 0x0, sc_gccb_busbase = 
> 0x1d000, sc_gccbs = 0xf51c7000, sc_free_gccb = {
>     slh_first = 0xf51e1860}, sc_pending_gccb = {slh_first = 0xf5211440}, 
> sc_ccb_queue = {tqh_first = 0x0,
>     tqh_last = 0xcadf9050}, sc_ucmd_queue = {tqh_first = 0x0, tqh_last = 
> 0xcadf9058}, sc_ic_all_size = 0x2fc0,
>   sc_cmd_len = 0x24, sc_cmd_off = 0x24, sc_cmd_cnt = 0x1,
>   sc_cmd = 
> "\000\000\000\000d\000\000\000\002\000\000\000¿#\235\000\200\000\000\000ÿÿÿÿ\001\000\000\000\000\000\006â\000\000\001", 
> '\000' <repeats 288 times>, sc_info = 0x0, sc_info2 = 0x0, sc_status = 
> 0x1000, sc_service = 0x0,
>   sc_bus_cnt = 0x3, sc_virt_bus = 0x2, sc_bus_id = "\a\a\000\000\000", 
> sc_more_proc = 0x0, sc_hdr = {{
>       hd_present = 0x1, hd_is_logdrv = 0x0, hd_is_arraydrv = 0x0, 
> hd_is_master = 0x0, hd_is_parity = 0x0,
>       hd_is_hotfix = 0x0, hd_master_no = 0x0, hd_lock = 0x0, hd_heads = 
> 0xff, hd_secs = 0x3f, hd_devtype = 0x0,
>       hd_size = 0x88efe6a, hd_ldr_no = 0x0, hd_rw_attribs = 0x0, 
> hd_start_sec = 0x0}, {hd_present = 0x0,
>       hd_is_logdrv = 0x0, hd_is_arraydrv = 0x0, hd_is_master = 0x0, 
> hd_is_parity = 0x0, hd_is_hotfix = 0x0,
>       hd_master_no = 0x0, hd_lock = 0x0, hd_heads = 0x0, hd_secs = 0x0, 
> hd_devtype = 0x0, hd_size = 0x0,
>       hd_ldr_no = 0x0, hd_rw_attribs = 0x0, hd_start_sec = 0x0} <repeats 
> 99 times>}, sc_raw_feat = 0x1,
>   sc_cache_feat = 0x101, sc_dvr = {size = 0x0, eu = {stream = '\000' 
> <repeats 15 times>, driver = {ionode = 0x0,
>         service = 0x0, index = 0x0}, async = {ionode = 0x0, service = 
> 0x0, status = 0x0, info = 0x0,
>         scsi_coord = "\000\000"}, sync = {ionode = 0x0, service = 0x0, 
> status = 0x0, info = 0x0, hostdrive = 0x0,
>         scsi_coord = "\000\000", sense_key = 0x0}, test = {l1 = 0x0, l2 
> = 0x0, l3 = 0x0, l4 = 0x0}}, severity = 0x0,
>     event_string = '\000' <repeats 255 times>}, sims = {0xcadfac00, 
> 0xcadfab40, 0xcadfaa80, 0x0, 0x0, 0x0}, paths = {
>     0xcadf3c20, 0xcadf3bf0, 0xcadf3bc0, 0x0, 0x0, 0x0}, sc_copy_cmd = 
> 0xc01b90d4 <gdt_mpr_copy_cmd>,
>   sc_get_status = 0xc01b9190 <gdt_mpr_get_status>, sc_intr = 0xc01b91b4 
> <gdt_mpr_intr>,
>   sc_release_event = 0xc01b92d0 <gdt_mpr_release_event>, sc_set_sema0 = 
> 0xc01b92f0 <gdt_mpr_set_sema0>,
>   sc_test_busy = 0xc01b9310 <gdt_mpr_test_busy>, links = {tqe_next = 
> 0x0, tqe_prev = 0xc04ffe80}}
> 
> -----------------------------------------------------
> Now I try to figure which iir_intr() code path was executed.
> Only the case GDT_GCF_IOCTL: code path leads to a bcopy().
> 
> I walked all the struct gdt_ccb * in gdt->sc_gccbs[],
> Only 1 has a  non-zero  gccb->gc_flags value; its
> value is 4 (GDT_GCF_IOCTL)
> 
> (kgdb) set $SCBS=(struct gdt_ccb *)&$SC->sc_gccbs[121]
> (kgdb) p *$SCBS
> $218 = {gc_scratch = "\001\000\0013", '\000' <repeats 3067 times>, 
> gc_ccb = 0xcb16c400, gc_ucmd = 0xcb9f9000,
>   gc_dmamap = 0x0, gc_map_flag = 0x1, gc_timeout = 0x0, gc_state = 0x0, 
> gc_service = 0x9, gc_cmd_index = 0x7b,
>   gc_flags = 0x4, sle = {sle_next = 0x0}}
> 
> ---------------------------------------
> Down to the bcopy():
> the bcopy() decission is made off of values in gc_ucmd,
> and nothing good can come from using most of these values:
> 
> (kgdb) set $UCMD=(gdt_ucmd_t *)$SCBS->gc_ucmd
> (kgdb) p *$UCMD
> $219 = {io_node = 0xc0de, service = 0xdead, timeout = 0xc05076a0, status 
> = 0x1, info = 0x0, BoardNode = 0xc0ded8b2,
>   CommandIndex = 0xc0dedead, OpCode = 0xdead, u = {cache = {DeviceNo = 
> 0xc0de, BlockNo = 0xc0dedead,
>       BlockCnt = 0xc0dedead, DestAddr = 0xc0dedead}, ioctl = {param_size 
> = 0xc0de, subfunc = 0xc0dedead,
>       channel = 0xc0dedead, p_param = 0xc0dedead}, raw = {reserved = 
> 0xc0de, direction = 0xc0dedead,
>       mdisc_time = 0xc0dedead, mcon_time = 0xc0dedead, sdata = 
> 0xc0dedead, sdlen = 0xc0dedead, clen = 0xc0dedead,
>       cmd = "­ÞÞÀ­ÞÞÀ­ÞÞÀ", target = 0xad, lun = 0xde, bus = 0x1, 
> priority = 0x0, sense_len = 0x0, sense_data = 0x0,
>       link_p = 0x10}}, data = "\001\000\0013", '\000' <repeats 3067 
> times>, complete_flag = 0xcb16c400, links = {
>     tqe_next = 0xcb9f9000, tqe_prev = 0x0}}
> 
> ----------------------------
> Not knowing anything about iir/scsi, it appears to me
> that gdt->sc_gccbs[121]->gc_ucmd has been freed and yet is still 
> referenced and
> in use.
> 
> How is this  ddt_ucmd_t * gc_ucmd data managed ?
> Is it actively malloc()ed and free()d ?
> 
> Any clues or pointers will be appreciated.
> 
> 

-- 
Dave Baukus
dbaukus at chiaro.com			
	Chiaro Networks Ltd.
	Richardson, Texas
	USA


More information about the freebsd-scsi mailing list