aic7xxx / AHA2940 worries... anyone?
Doug Ledford
dledford at dialnet.net
Sun Aug 10 23:26:17 PDT 1997
--------
> One thing I did not make clear was that the crashes occur more frequently
> when the file sizes are larger and also occur more frequently when the CPU
> speed increases. At a 2GB file size, a single pass of Bonnie often wont
> complete. This evening I have been running Bonnie in the background on
> 2GB files with Symbios 875 based controllers so I don't think it is my
> drives excepting that the 875 will not queue deeper than 12 commands..
This goes in line with what I was tlaking about. With a larger file size,
you increase the statistical probability that a command will get lost
somewhere in that operation. With faster CPU speeds, you can create/destroy
SCSI blocks and commands faster (meaning we put stuff out to the cards
faster, and they correspondingly respond with command completes faster).
Note the various seemingly CPU bound operations in the bonnie tests. These
operations will run faster and have a higher likelyhood of outstepping the
CPU when you up the CPU speed. This seems backwards, but when you consider
the following, maybe not:
1: At a faster CPU speed, we can create more read/write requests in given
time slice.
2: Certain operations are nearly always fixed length, regardless of CPU
speed (these include bus accesses on devices other than our own aic7xxx
devices such as the regular timer interrupt accesses, etc.)
3: Windows exist in the code during which interrupts are turned off while
the CPU reads and writes to other devices on the bus. These windows change
based on a combination of CPU speed and percent of operation that is CPU
bound vs. bus bound.
4: For windows that have a high bus/cpu bounding ratio, the time we are at
an interrupt off level can be held nearly constant with increasing CPU speed.
5: During those same windows, we have previously sent out commands to the
drives at a somewhat faster rate due to CPU speed increases. These commands
can then be returning at a comparable faster rate depending on drive
capability and the luck of the draw in regards to the drive cache and
controller firmware.
6: It is entirely possible that during any such fixed length window, faster
CPU speeds can result in more commands returning complete and increasing the
risk that the aic7xxx controllers QOUTFIFO might overflow while the card can
not be serviced.
>
> > driver firmware, it may just be that under the kind of load 9 drives can
> > create on a controller, we are losing commands and getting hosed.
>
> I have three controllers with three drives per controller. The mdadd is
> set up such that in sequential accesses the each next card is accessed. I
> did this thining that it would distribute the load accross the controllers
> a little better. There are cases were it is faster. The mdadd is:
>
> /sbin/mdadd /dev/md0 /dev/sdb2 /dev/sde2 /dev/sdh2 \
> /dev/sdc2 /dev/sdf2 /dev/sdi2 \
> /dev/sdd2 /dev/sdg2 /dev/sdj2
> /sbin/mdrun -p0 -c64k /dev/md0
>
> disks sdb, sdc and sdd are on scsi2.
> disks sde, sdf and sdg are on scsi3.
> disks sdh, sdi and sdj are on scsi4.
>
> So in a sequential access the code hops from controller to controller. It
> makes a measureable difference when I read from cylinders near the outer
> edge. I don't have a direct comparison to a case with the interleaving at
> my fingertips, but check out the block read and block write rates in the
> attachment. Not too shabby IMHO.
In my experience, these types of things will help during testing, but once
you get into a production environment with lots of file reads and writes and
a reasonably used filesystem that isn't almost all free space, then these
advantages quickly disappear as the reads and writes will start to disperse
amongst the various drives on their own (a news server is a good example of
thise, where over time the filesystem has been written/read from enough that
any given file is randomly located on the device, so any attempt at ordering
the drives for optimization doesn't really gain anything, although the same
is also true that any given ordering of the drives isn't going to hurt
things any either).
--
*****************************************************************************
* Doug Ledford * Unix, Novell, Dos, Windows 3.x, *
* dledford at dialnet.net 873-DIAL * WfW, Windows 95 & NT Technician *
* PPP access $14.95/month *****************************************
* Springfield, MO and surrounding * Usenet news, e-mail and shell account.*
* communities. Sign-up online at * Web page creation and hosting, other *
* 873-9000 V.34 * services available, call for info. *
*****************************************************************************
More information about the aic7xxx
mailing list