scsi_target experiences and questions

Sat Dec 27 09:06:16 PST 2003

Greetings to all,

In regards the scsi_target code, I'd like to share some experiences and
ask a few questions.

I've been working with scsi_target with FreeBSD 4.8-R and 5.1-R targets
and mostly Linux 2.4.[18,20] initiators, using both Fibre Channel
(Qlogic QLA2100) and Parallel SCSI (Adaptec 29160) connections.

First, the experiences:

1) Chuck Tuffli wrote to this list in August about having trouble using
   anything but LUN "0" when configuring a Qlogic card in target mode.
   (The specific error was "enable lun CCB rejected, status 0x39".)  I
   had this problem also, and it seemed to disappear after I enabled the
   card's BIOS in the Alt-Q setup utility.  Also in the BIOS, I set the
   execution throttle to 255, because 256 was incorrectly read as "0"
   during system boot.

2) I had a problem that the cam/scsi/scsi_target kernel code wouldn't
   shut down cleanly: The TARGIOCDISABLE call would never return, and
   the system had to be reboot between emulator invocations.  I tracked
   this down to the "tsleep()" (msleep() in 5.1?) that immediately
   follows the comment "If we aborted at least one pending CCB ok, wait
   for it."  I commented out the tsleep() call, which appears to allow
   it to be cleanly shut down and restarted multiple times.  (Perhaps
   that's causing some kernel state to not get cleaned up?)  This was
   using the fibre channel card -- I haven't tested whether the problem
   exists using parallel SCSI.

3) I had to disable the "pending unit attention" that is set to indicate
   the device's "powering on".  (I.e., I commented out the line
   "istate->pending_ua = UA_POWER_ON" in scsi_cmds.c.)  This was because
   a FreeBSD initiator wasn't correctly handling the CHECK CONDITION
   status received during the READ CAPACITY command, and instead tried
   reading an invalid geometry.  (This was also only tested with fibre
   channel.  Also, it was before I tried enabling autosense, which may
   fix the problem.)  Without this comment, the device was seen as:

   da1: <FreeBSD Emulated Disk 0.1> Fixed Direct Access SCSI-3 device
   da1: 100.000MB/s transfers
   da1: 0MB (268784067 0 byte sectors: 0H 0S/T 0C)

   Once the comment was in place, it was correctly read as:

   da1: <FreeBSD Emulated Disk 0.1> Fixed Direct Access SCSI-3 device
   da1: 100.000MB/s transfers
   da1: 20MB (40960 512 byte sectors: 64H 32S/T 20C)

4) This isn't specifically related to the scsi_target code, but I was
   surprised to find that the AIO code has a race condition: if you
   aio_write block X, then immediately synchronously read block X before
   the AIO code commits the written block to disk, you'll get the old
   version of X during the read.  I worked around this by always doing
   synchronous writes.

5) Kudos to Nate Lawson and Justin Gibbs for all their good work.  I
   recommend that Nate add a "FAQ" section to his scsi_target page with
   some of the other relevant goodies that have been posted to this list
   (for example, Kenneth Merry posted some gems about enabling autosense
   and increasing MAX_INITIATORS); that would aid in us users getting
   working systems more quickly.

6) I think I've finally figured out the execution path for READ and
   WRITE requests through the scsi_target.c and scsi_cmds.c files.  Nate
   is either an unparalleled genius (for getting it to work in the first
   place) or a diabolical masochist (for knowing that the rest of us
   would tear our hair out following his multiple re-entries into the
   tcmd_handle() function).

   Going off the top of my head, here are a few pointers for anyone else
   digging through the code.  (Note that I may be wrong in places.)
   "ATIO" stands for "Accept Target I/O".  At startup, the user code
   allocates space for a bunch of ATIOs and gives them to the kernel.
   When new requests arrive on the SCSI bus, the kernel sends one of
   these ATIOs back to the user.  "CTIO" stands for "Continue Target
   I/O".  CTIOs are used to transfer data between the user and kernel
   processes.  This is necessary because each CTIO can only handle
   DFLTPHYS (or maybe MAXPHYS) amount of data, but SCSI requests can be
   larger than that.  Once the request is done processing, the user code
   sends the ATIO back to the kernel for reuse.  The user signals that a
   READ request is complete [i.e., then the kernel code is permitted to
   finish the bus transaction] by sending the final CTIO, and it signals
   that a WRITE request is complete by sending the final ATIO.

   I found it very useful to print out function names (as well as ATIO
   and CTIO pointers) as functions were entered; this greatly helped my
   understanding of how READ and WRITE requests were processed.

Now, for a few questions:

A) How difficult will it be to add tagged queuing support to the Adaptec
   driver?  (Or, is the missing piece simply support in the scsi_target
   user code?)  I would find this capability extremely useful.

B) I've been having a maddening problem using tagged queuing with fibre
   channel.  When the system is under high write load (e.g., if I am
   emulating a large disk and I run mkfs over the disk) the scsi_target
   will crash in a seemingly nondeterministic way.  (Note, I am using a
   modified version of scsi_target, but I haven't changed its core
   behavior.)  Sometimes an ATIO is received with its flags field set to
   zero (causing work_atio() to abort) -- this would seem to indicate
   that the ATIO were sent before the data were received from the
   initiator, but this seems impossible based on my cursory look over
   the Qlogic driver code.  Other times I start getting a series of
   error messages on the console [it may be the "no ATIO2s for lun 0
   from initiator 3" message, but I'm not certain -- I can reproduce the
   exact message if it will help this debugging] but I don't think it's
   run out of ATIOs, because I increased the number to 1024 and there
   were only about ten outstanding requests at the time.

   My best guesses are that because of the high write load some kernel
   structures are getting overwritten, or linked lists of free/allocated
   ATIOs in the kernel are being improperly managed, but the answer is
   so far eluding me.  A colleague suggested it may be mishandling of
   tagged requests by the XPT code, but that seems unlikely.  Does
   anyone have any idea what might be causing this, or what a solution
   might be (other than rate-limiting the initiator)?