7.1-PRERELEASE: arcmsr write performance problem
Paul MacKenzie
bsdlist at cogeco.ca
Tue Dec 16 15:03:31 PST 2008
>> The next thing I am doing is going to be removing the QUOTA feature
>> to see if this has any bearing
>> on this problem. It does not appear to be even writing at a heavy
>> load as you can see (almost
>> nothing) but the processes are mostly in UFS when it spirals out of
>> control.
>
>
> Whats strange is that the output from gstat shows the disks hardly
> active at all.... Yet why is the syncer at 100% ? Do you have write
> caching disabled on the array ? What does the raw throughput look
> like to the disks ? e.g. if you try a simple dd if=/dev/zero
> of=/var/tmp bs=1024k count=1000 ?
>
>> I moved the processing of amavisd-new into a memory drive to at least
>> take that off the IO and this
>> seems to have helped a bit. There is not a lot of mail going through
>> the system but every little bit
>> helps. I suspect this is one other reason that is bringing the
>> problem to the forefront as
>> amavisd-new can use the disks a bit to process each e-mail.
>
>
> Is the high load average simply a function of processes blocking on
> network io ? On our av/spam scanners for example show a high load avg
> because there are many processes waiting on network io to complete
> (e.g. talking to RBL lists, waiting for DCC servers to complete etc)
>
> Also, is it really related to the arcmsr driver ? i.e. if you did the
> same tasks on a single IDE drive, is the performance profile going to
> be the same ?
>
> ---Mike
>
Hi Mike,
Well I tried to remove both the USB com ports drivers and the QUOTA out
of the kernel last night and this has not solved it but it seems a bit
more stable today. The HTTP only had a problem two times last night.
I am not sure if it is specifically related to the arcmsr driver but
unfortunately I am unable to try a single IDE setup at the moment. If I
can get to the bottom of why it is locking then it might point us in the
right direction.I was told that Jan downgraded to 6.4 as she could never
resolve her issue and worked on it for a very long time.
Write caching is enabled on the array which was the first thing I
checked and I have the battery backup installed and I confirm it shows
up in the areca-cli as 100% charged. Do you think it may be hardware
related even though there are no errors at all? I have checked the event
log in the S50000PAL which is very sensitive to errors I have found in
the past as well as the event log of the areca-cli. Both are error free.
With regards to the e-mail scanning waiting for RBL completion there is
only usually one e-mail per minute approximately to give you an idea of
the load with regards to the e-mail so this is not really a reliable
test and I don't see how this is an overall contributing factor as there
seem to be many ways to bring the locking forward including running a
dump being one of them in my experience. What I have found is the more I
take off the load of the system writing seems to help a bit so I have
been doing everything I can to help with this until I can find a
workable solution. I have recompiled all the ports a few times over the
past month in hopes that something might get fixed if it was a port
issue and all ports are as up-to-date as possible using portupgrade and
tracking the port tree.
The primary problem is what you said above the gstat shows very little
activity but the system seems to be "stuck". The syncer is not always
at 100% as it comes and goes I grabbed that at one time when watching
it but it did show how "little" activity there was from the reports.
Here is the output of dd on the WORKING server:
dd if=/dev/zero of=/usr/test bs=1024k count=1000
1000+0 records in
1000+0 records out
1048576000 bytes transferred in 17.874501 secs (58663232 bytes/sec)
Here is the output of dd on the one not working right but NOT "locked"
right now. I need to wait for it to "lock" again before I can test this
again.
dd if=/dev/zero of=/usr/test bs=1024k count=1000
1000+0 records in
1000+0 records out
1048576000 bytes transferred in 34.270080 secs (30597419 bytes/sec)
The numbers are pretty reasonable albeit half on one compare to the
other. I did notice on the non-working server the system numbers were
always much higher when I ran the dd. This could be a coincidence but
the syncer was not seen in the list of the working server but it was on
the list on the one with the problem when running both with top -IS
which the dd was running. I also noticed the system number s are always
much higher on the one with the problem.
I am waiting for the system to lock again to try and see what it shows
when it is locked.
I suppose I will work on getting 7.0 running (downgrade) next to see if
I have the same problem on this version as another clue to the problem.
Thanks again for your help so far.
Paul
More information about the freebsd-stable
mailing list