AIC7899's 2 SCSI channels infect with each other in target mode?

Sun Sep 28 20:07:19 PDT 2003

>From: Nate Lawson <nate at root.org>
>To: Jingrong Xie <xjrcool at hotmail.com>
>CC: freebsd-scsi at freebsd.org
>Subject: Re: AIC7899's 2 SCSI channels infect with each other in target 
>mode?
>Date: Wed, 6 Aug 2003 07:18:38 -0700 (PDT)
>
>On Fri, 1 Aug 2003, Jingrong Xie wrote:
> > I have two machine A and B, each with a 39160 card (aic7899 processor),
> > the Installatioin Guide of the card says, "The Adaptec SCSI Card 39160 
>has
> > two INDEPENDENT SCSI channels, ...".
>
>That is correct.
>
> > I use Nate's scsi_target  on A, and my test_write_scsi.c on B to write 
>to
> > Emulated Disk uninterruptedly, it works perfectly.
> > Also using scsi_target on B and test_write_scsi.c on A works perfectly.
> > But when I use the two at the same time, kernel code of scsi_target 
>crash
> > like this:
> >
> > Fatal trap 12: page fault wihle in kernel mode
> > ......
> > Stopped at targdone + 0x84: movl %eax, 0x20(%edx)
> >
> > <<<<<<<<Dump Card State Ends>>>>>>>>
> > ahc1: Bus Device Reset on Ahc0(0:5:1) SCBS aborted.
>
>You shouldn't be able to do this since ahc(4) does not support
>simultaneous target/initiator mode.  When you have two instances running
>and then attempt to scan the other, the ahc driver should not allow this.
>
>Still, this does appear to be a problem in targ(4) and I'll look into it.
>I'm very busy so it may take me a while to get my target testing rig up
>again.

Yes, I tried the one-way read/write using one channel in scsi_target mode, 
using "dd" to read from and write to the "FreeBSD Emulated Disk" -- 
/dev/da0, machine running scsi_target also crashed, the same as before:

Fatal trap 12: page fault while in kernel mode
fatal virtual address = 0x2a
fatal code = supervisor write, page not present
instructioin pointer = 0x8:0xe12c3a8
stack pointer = 0x10: 0xd49a7d48
frame pointer = 0x10: 0xd49a7d50
code segment = base 0x0, limit 0xfffff, type 0x16
                        DPL 0, pres 1, def32 1, gran 1
processor flags = trace trap, interrupt enabled, kernel IOPL=0
current process = 26962(scsi_target)
interrupt mask =
kernel: type 12 trap, code = 0
Breakpoint attargdone + 0x84: movl %eax, 0x20 (%edx)

in online kernel debug mode, I traced the calling stack:
targdone() + 0x84
camisr() + 0x253
swi_cambio() +0xd
splz_swi() + 0x14
targwrite() + 0x207
spec_write() + 0x5d
ufsspec_write() + 0x20
ufs_vnoperatespec() + 0x15
vn_write() + 0x156
dofilewrite() + 0x7f
write() + 0x36
syscall2() + 0x16a
Xint0x80_syscall() + 0x25

and this is Nate's scsi_target.c (kernel) code segement of targdone(), it 
seems that crushing occured duing "CCBs go back to userland.", is it so?
static void
targdone(struct cam_periph *periph, union ccb *done_ccb)
{
        struct targ_softc *softc;
        cam_status status;

        CAM_DEBUG(periph->path, CAM_DEBUG_PERIPH, ("targdone %p\n", 
done_ccb));
        softc = (struct targ_softc *)periph->softc;
        TAILQ_REMOVE(&softc->pending_ccb_queue, &done_ccb->ccb_h,
                     periph_links.tqe);
        status = done_ccb->ccb_h.status & CAM_STATUS_MASK;

        /* If we're no longer enabled, throw away CCB */
        if ((softc->state & TARG_STATE_LUN_ENABLED) == 0) {
                targfreeccb(softc, done_ccb);
                return;
        }
        /* abort_all_pending() waits for pending queue to be empty */
        if (TAILQ_EMPTY(&softc->pending_ccb_queue))
                wakeup(&softc->pending_ccb_queue);

        switch (done_ccb->ccb_h.func_code) {
        /* All FC_*_QUEUED CCBs go back to userland */
        case XPT_IMMED_NOTIFY:
        case XPT_ACCEPT_TARGET_IO:
        case XPT_CONT_TARGET_IO:
                TAILQ_INSERT_TAIL(&softc->user_ccb_queue, &done_ccb->ccb_h,
                                  periph_links.tqe);
                notify_user(softc);
                break;
        default:
                panic("targdone: impossible xpt opcode %#x",
                      done_ccb->ccb_h.func_code);
                /* NOTREACHED */
        }
}

on the initiator end, kernel reported as follows:
SCB 0xf - timed out
ahc0: Dumping Card state while idle
......
SCB 0xe - timed out
ahc0: Dumping Card state while idle
......
SCB 0xf - timed out
ahc0: Dumping Card state while idle
......
SCB 0xe - timed out
ahc0: Dumping Card state while idle
......
SCB 0xf - timed out
ahc0: Dumping Card state while idle
......
total 5 times, with very average time-interval : 68 seconds.

Can we conclude that the problem exists in userland scsi_target program?

or it is the kernel scsi_target's duty?

>
> > A# scsi_target 0:5:0
> > B# camcontrol rescan 0
> > B# ./test_write_scsi
> >
> > B# scsi_target 1:5:0
> > A# camcontrol rescan 1
> > A# ./test-write_scsi
>
>I'd like to see your code for "test-write_scsi".  This seems to show that
>you run them sequentially, not concurrently.
>
>-Nate

_________________________________________________________________
Help STOP SPAM with the new MSN 8 and get 2 months FREE*  
http://join.msn.com/?page=features/junkmail