aic7xxx / AHA2940 worries... anyone?

Doug Ledford dledford at dialnet.net
Sun Aug 10 23:26:17 PDT 1997


--------

> One thing I did not make clear was that the crashes occur more frequently
> when the file sizes are larger and also occur more frequently when the CPU
> speed increases.  At a 2GB file size, a single pass of Bonnie often wont
> complete.  This evening I have been running Bonnie in the background on
> 2GB files with Symbios 875 based controllers so I don't think it is my
> drives excepting that the 875 will not queue deeper than 12 commands.. 

This goes in line with what I was tlaking about.  With a larger file size, 
you increase the statistical probability that a command will get lost 
somewhere in that operation.  With faster CPU speeds, you can create/destroy 
SCSI blocks and commands faster (meaning we put stuff out to the cards 
faster, and they correspondingly respond with command completes faster).  
Note the various seemingly CPU bound operations in the bonnie tests.  These 
operations will run faster and have a higher likelyhood of outstepping the 
CPU when you up the CPU speed.  This seems backwards, but when you consider 
the following, maybe not:

1: At a faster CPU speed, we can create more read/write requests in given 
time slice.
2: Certain operations are nearly always fixed length, regardless of CPU 
speed (these include bus accesses on devices other than our own aic7xxx 
devices such as the regular timer interrupt accesses, etc.)
3: Windows exist in the code during which interrupts are turned off while 
the CPU reads and writes to other devices on the bus.  These windows change 
based on a combination of CPU speed and percent of operation that is CPU 
bound vs. bus bound.
4: For windows that have a high bus/cpu bounding ratio, the time we are at 
an interrupt off level can be held nearly constant with increasing CPU speed.
5: During those same windows, we have previously sent out commands to the 
drives at a somewhat faster rate due to CPU speed increases.  These commands 
can then be returning at a comparable faster rate depending on drive 
capability and the luck of the draw in regards to the drive cache and 
controller firmware.
6: It is entirely possible that during any such fixed length window, faster 
CPU speeds can result in more commands returning complete and increasing the 
risk that the aic7xxx controllers QOUTFIFO might overflow while the card can 
not be serviced.

>  
> > driver firmware, it may just be that under the kind of load 9 drives can 
> > create on a controller, we are losing commands and getting hosed.
> 
> I have three controllers with three drives per controller.  The mdadd is
> set up such that in sequential accesses the each next card is accessed.  I
> did this thining that it would distribute the load accross the controllers
> a little better.  There are cases were it is faster.  The mdadd is: 
> 
> /sbin/mdadd /dev/md0 /dev/sdb2 /dev/sde2 /dev/sdh2 \
>                      /dev/sdc2 /dev/sdf2 /dev/sdi2 \
>                      /dev/sdd2 /dev/sdg2 /dev/sdj2
> /sbin/mdrun -p0 -c64k /dev/md0
> 
> disks sdb, sdc and sdd are on scsi2. 
> disks sde, sdf and sdg are on scsi3.
> disks sdh, sdi and sdj are on scsi4.
> 
> So in a sequential access the code hops from controller to controller.  It
> makes a measureable difference when I read from cylinders near the outer
> edge.  I don't have a direct comparison to a case with the interleaving at
> my fingertips, but check out the block read and block write rates in the
> attachment.  Not too shabby IMHO. 

In my experience, these types of things will help during testing, but once 
you get into a production environment with lots of file reads and writes and 
a reasonably used filesystem that isn't almost all free space, then these 
advantages quickly disappear as the reads and writes will start to disperse 
amongst the various drives on their own (a news server is a good example of 
thise, where over time the filesystem has been written/read from enough that 
any given file is randomly located on the device, so any attempt at ordering 
the drives for optimization doesn't really gain anything, although the same 
is also true that any given ordering of the drives isn't going to hurt 
things any either).

-- 
*****************************************************************************
* Doug Ledford                      *   Unix, Novell, Dos, Windows 3.x,     *
* dledford at dialnet.net    873-DIAL  *     WfW, Windows 95 & NT Technician   *
*   PPP access $14.95/month         *****************************************
*   Springfield, MO and surrounding * Usenet news, e-mail and shell account.*
*   communities.  Sign-up online at * Web page creation and hosting, other  *
*   873-9000 V.34                   * services available, call for info.    *
*****************************************************************************





More information about the aic7xxx mailing list