Odd performance problems after upgrade from 4.11 to 6.0-Stable
Kevin Oberman
oberman at es.net
Mon Jan 2 10:37:02 PST 2006
> Date: Wed, 14 Dec 2005 19:52:03 -0500
> From: Kris Kennaway <kris at obsecurity.org>
>
>
> --45Z9DzgjV8m4Oswq
> Content-Type: text/plain; charset=us-ascii
> Content-Disposition: inline
> Content-Transfer-Encoding: quoted-printable
>
> On Wed, Dec 14, 2005 at 04:45:47PM -0800, Kevin Oberman wrote:
> > > Date: Wed, 14 Dec 2005 19:34:04 -0500
> > > From: Kris Kennaway <kris at obsecurity.org>
> > >=20
> > > On Wed, Dec 14, 2005 at 04:26:18PM -0800, Kevin Oberman wrote:
> > >=20
> > > > I am attaching a dmesg. I do have a few of drivers (uhci, pcm, psm,
> > > > atkbd0 and ichsmb) that are still marked as GIANT-LOCKED, but I'm not
> > > > using the USB very often. And I'm not using pcm or ichsmb during the
> > > > dump, either. I think everyone has the mouse and keyboard under GIANT,
> > > > but I can't really see those as a problem, either.
> > >=20
> > > A bunch of things are sharing interrupts with USB..disable it and see
> > > if that helps. Also check vmstat -i to see if some device is
> > > storming. If not, turn on MUTEX_PROFILING(9) in your kernel and run
> > > the dump (or something faster that also exhibits the problem), then
> > > look for what is contending with Giant.
> >=20
> > Yes, it may be time for MUTEX_PROFILING. I had already looked at
> > interrupts. My kernel is sans APIC so I didn't really think that
> > interrupts were a problems and I see:
> > interrupt total rate
> > irq0: clk 207037779 1000
> > irq1: atkbd0 50208 0
> > irq6: fdc0 9 0
> > irq8: rtc 26498038 128
> > irq10: pcm0 ichsmb0 2 0
> > irq11: xl0 uhci0 18076067 87
> > irq12: psm0 869500 4
> > irq13: npx0 1 0
> > irq14: ata0 10423468 50
> > irq15: ata1 112 0
> > Total 262955184 1270
> >
> > Clearly no storms and nothing looks obviously broken. USB and the
> > network card share an IRQ, but the USB is not connected to anything and
> > I would not think that it is generating many interrupts. The network
> > IS being used and I'm not seeing all that many interrupts on IRQ11.
>
> Whenever there is an interrupt on irq11 from the NIC, *both* drivers
> will wake up to process it. uhci0 will need to acquire Giant. If
> something else is also trying to acquire Giant (bufdaemon), then they
> will serialize, degrading performance. This may not be the cause
> since there are only a few interrupts, but MUTEX_PROFILING will tell
> you.
Well, with the holidays and such, this has taken a while, but here is an
update.
I have removed USB support. I hardly ever use it on this system, so that
was an obvious step. No improvement at all.
# vmstat -i
interrupt total rate
irq0: clk 319818027 1000
irq1: atkbd0 15443 0
irq6: fdc0 11 0
irq8: rtc 40932392 128
irq10: pcm0 ichsmb0 125545 0
irq11: xl0 3616426 11
irq12: psm0 281380 0
irq13: npx0 1 0
irq14: ata0 8756176 27
irq15: ata1 144 0
Total 373545545 1168
Only one shared interrupt and both IRQ 10 devices should have been
totally quiescent during my test run.
The test was building a glimpse index of my inbox. CPU at about
20%. System interactive response was terrible. Took about two minutes
just to log in. Starting Gnome takes roughly forever (about 10
minutes).
I collected mutex stats for just about 3 minutes and found nothing
surprising, but I may not know what to look for. Nothing shows a total
time of over 3.1 seconds. The total time for all of them is 28
seconds. The sum of all Giant lock times was only 4.65 seconds and the
largest of these was in kern_sysctl.c, so I expect it was the profiling
that ate 3.1 of those 4.65 seconds.
I am attaching a spreadsheet with the profile data in case anyone wants
to look at it. (Probably the mail system will strip it, so let me know if I
should post it.)
Still totally baffled and still feeling the pain.
--
R. Kevin Oberman, Network Engineer
Energy Sciences Network (ESnet)
Ernest O. Lawrence Berkeley National Laboratory (Berkeley Lab)
E-mail: oberman at es.net Phone: +1 510 486-8634
More information about the freebsd-stable
mailing list