problem with large aio_write(2)s to raw device through Compaq ciss driver

Bob Bawn bbawn at allocity.com
Mon Jun 23 10:29:44 PDT 2003


Hello,

(I hope this is the right forum for this issue - I tried
freebsd-questions a couple weeks ago and got no response.)

I am running FreeBSD 4.7 on a Compaq DL 380 with a Compaq
Smart Array 5i.

My application accesses a raw device (e.g. /dev/da0s1g) using
aio_write(2).

aio_writes of buffers larger than 224 (512-byte) blocks but
smaller than 257 blocks fail with EIO. I get the following
messages when this happens:

bus_dmamap_load: Too many segs! buf_len = 0x3000
ciss0: invalid command, offense size 0 at 52, value 0x0

These writes succeed on various other hardware configurations
(Dell RAID, SCSI disk, IDE disk, etc.) so I suspect the ciss driver.

Synchronous (write(2)) writes in this size range to the raw device
succeed.

aio_writes to normal files succeed.

Glancing through the ciss source, I noticed that 224 * 512 = 28 * 4096
where 28 is CISS_COMMAND_SG_LENGTH (the max number of scatter/gather
elements per command??). So maybe the write fails if the s/g vector
doesn't fit in a single command?  (I am a non-expert in this area, so
this is speculative...)

I don't understand why writes larger than 256 blocks succeed.

The following patch seems to fix the problem:


*** /usr/src/sys/dev/ciss/cissvar.h.orig        Mon Jun 16 14:16:39 2003
--- /usr/src/sys/dev/ciss/cissvar.h     Mon Jun 16 14:21:40 2003
***************
*** 140,146 ****
    * too small.
    */

! #define CISS_COMMAND_ALLOC_SIZE               512     /* XXX tune to 
get sensible s/g list length */
   #define CISS_COMMAND_SG_LENGTH        ((CISS_COMMAND_ALLOC_SIZE - 
sizeof(struct ciss_command)) \
                                  / sizeof(struct ciss_sg_entry))

--- 140,153 ----
    * too small.
    */

! /*
!  * 6/16/03 bbawn - aio_write(2)s between 225 and 256 blocks (inclusive)
!  * fail with EIO with CISS_COMMAND_ALLOC_SIZE of 512. Fix (or actually 
kludge
!  * around this) by having room for enough scatter/gather entries to
!  * exceed 256 blocks) (the max size for a SCSI WRITE(6) command??).
!  * #define CISS_COMMAND_ALLOC_SIZE            512
!  */
! #define CISS_COMMAND_ALLOC_SIZE               1024    /* XXX tune to 
get sensible s/g list length */
   #define CISS_COMMAND_SG_LENGTH        ((CISS_COMMAND_ALLOC_SIZE - 
sizeof(struct ciss_command)) \
                                  / sizeof(struct ciss_sg_entry))


Any clues on what's going on here? (and more importantly, if my "fix" is
adequate?) If time permits, I hope to investigate in the debugger but
any information would be appreciated.

I have a small program that illustrates the problem - let me know
if you want it.

It seems possible that I have something mis-configured. Here are the
boot messages from ciss:

Jun  6 10:00:16 queso /kernel: pci0: <PCI bus> on pcib0
Jun  6 10:00:16 queso /kernel: ciss0: <Compaq Smart Array 5i> port
0x2000-0x20ff mem 0xf5ef0000-0xf5ef3fff,0xf7ec0000-0xf7efffff irq 3 at 
device 1.0 on pci0
Jun  6 10:00:16 queso /kernel: ciss0: using 256 of 1024 available commands
Jun  6 10:00:16 queso /kernel: ciss0:   3 logical drives configured
Jun  6 10:00:16 queso /kernel: ciss0:   firmware 1.92
Jun  6 10:00:16 queso /kernel: ciss0:   2 SCSI channels
Jun  6 10:00:16 queso /kernel: ciss0:   signature 'CISS'
Jun  6 10:00:16 queso /kernel: ciss0:   valence 1
Jun  6 10:00:16 queso /kernel: ciss0:   supported I/O methods
0xe<simple,performant,MEMQ>
Jun  6 10:00:16 queso /kernel: ciss0:   active I/O method 0x3<simple>
Jun  6 10:00:16 queso /kernel: ciss0:   4G page base 0x00000000
Jun  6 10:00:16 queso /kernel: ciss0:   interrupt coalesce delay 1000us
Jun  6 10:00:16 queso /kernel: ciss0:   interrupt coalesce count 16
Jun  6 10:00:16 queso /kernel: ciss0:   max outstanding commands 1024
Jun  6 10:00:16 queso /kernel: ciss0:   bus types 0x2<ultra3>
Jun  6 10:00:16 queso /kernel: ciss0:   server name ''
Jun  6 10:00:16 queso /kernel: ciss0:   heartbeat 0x10000033
Jun  6 10:00:16 queso /kernel: ciss0: 3 logical drives
Jun  6 10:00:16 queso /kernel: ciss0: logical drive 0: RAID 5, 92160MB
online
Jun  6 10:00:16 queso /kernel: ciss0: logical drive 1: RAID 5, 92160MB
online
Jun  6 10:00:16 queso /kernel: ciss0: logical drive 2: RAID 5, 92160MB
online

Thanks,
Bob Bawn
bbawn at allocity.com




More information about the freebsd-scsi mailing list