amrd disk performance drop after running under high load
Kris Kennaway
kris at FreeBSD.org
Wed Oct 31 14:38:22 PDT 2007
Alexey Popov wrote:
> Hi
>
> Kris Kennaway wrote:
>>>>>>> So I can conclude that FreeBSD has a long standing bug in VM that
>>>>>>> could be triggered when serving large amount of static data (much
>>>>>>> bigger than memory size) on high rates. Possibly this only
>>>>>>> applies to large files like mp3 or video.
>>>>>> It is possible, we have further work to do to conclude this though.
>>>>> I forgot to mention I have pmc and kgmon profiling for good and bad
>>>>> times. But I have not enough knowledge to interpret it right and
>>>>> not sure if it can help.
>>>> pmc would be useful.
>>> pmc profiling attached.
>> OK, the pmc traces do seem to show that it's not a lock contention
>> issue. That being the case I don't think the fact that different
>> servers perform better is directly related.
> But it was evidence of mbuf lock contention in mutex profiling, wasn't
> it? As far as I understand, mutex problems can exist without increasing
> CPU load in pmc stats, right?
No, the lock functions will show up as using a lot of CPU. I guess the
lock profiling trace showed high numbers because you ran it for a long time.
>> There is also no evidence of a VM problem. What your vmstat and pmc
>> traces show is that your system really isn't doing much work at all,
>> relatively speaking.
>> There is also still no evidence of a disk problem. In fact your disk
>> seems to be almost idle in both cases you provided, only doing between
>> 1 and 10 operations per second, which is trivial.
> vmstat and network output graphs shows that the problem exists. If it is
> not a disk or network or VM problem, what else could be wrong?
The vmstat output you provided so far doesn't show anything specific.
>> In the "good" case you are getting a much higher interrupt rate but
>> with the data you provided I can't tell where from. You need to run
>> vmstat -i at regular intervals (e.g. every 10 seconds for a minute)
>> during the "good" and "bad" times, since it only provides counters and
>> an average rate over the uptime of the system.
> I'll try this, but AFAIR there was no strangeness with interrupts.
>
> I believe the reason of high interrupt rate in "good" cases is that
> server sends much traffic.
>
>> What there is evidence of is an interrupt aliasing problem between em
>> and USB:
>> irq16: uhci0 1464547796 1870
>> irq64: em0 1463513610 1869
> I tried disabling USB in kernel, this ussie was gone, but the main
> problem was left. Also I have this issue with interrupt aliasing on many
> servers without problems.
OK.
Kris
More information about the freebsd-stable
mailing list