Interrupt routine usage not shown by top in 8.0
Barney Cordoba
barney_cordoba at yahoo.com
Sun Mar 22 15:06:54 PDT 2009
--- On Wed, 3/18/09, Scott Long <scottl at samsco.org> wrote:
> From: Scott Long <scottl at samsco.org>
> Subject: Re: Interrupt routine usage not shown by top in 8.0
> To: "Barney Cordoba" <barney_cordoba at yahoo.com>
> Cc: "Sam Leffler" <sam at freebsd.org>, current at freebsd.org
> Date: Wednesday, March 18, 2009, 5:25 PM
> On Wed, 18 Mar 2009, Barney Cordoba wrote:
> > --- On Wed, 3/18/09, Scott Long
> <scottl at samsco.org> wrote:
> >>
> >> Filters were introduced into the em driver to get
> around a
> >> problem in
> >> certain Intel chipsets that caused aliased
> interrupts.
> >> That's a
> >> different topic of discussion that you are welcome
> to
> >> search the mail
> >> archives on. The filter also solves performance
> and
> >> latency problems
> >> that are inherent to the ithread model when
> interrupts are
> >> shared
> >> between multiple devices. This is especially bad
> when a
> >> high speed
> >> device like em shares an interrupt with a low
> speed device
> >> like usb.
> >> In the course of testing and validating the filter
> work, I
> >> found that
> >> filters caused no degradation in performance or
> excess
> >> context switches,
> >> while cleanly solving the above two problems that
> were
> >> common on
> >> workstation and server class machines of only a
> few years
> >> ago.
> >>
> >> However, both of these problems stemmed from using
> legacy
> >> PCI
> >> interrupts. At the time, MSI was still very new
> and very
> >> unreliable.
> >> As the state of the art progressed and MSI became
> more
> >> reliable, its
> >> use has become more common and is the default in
> several
> >> drivers. The
> >> igb and ixgbe drivers and hardware both prefer MSI
> over
> >> legacy
> >> interrupts, while the em driver and hardware still
> has a
> >> lot of legacy
> >> hardware to deal with. So when MSI is the
> >> common/expected/default case,
> >> there is less of a need for the filter/taskqueue
> method.
> >>
> >> Filters rely on the driver being able to reliably
> control
> >> the interrupt
> >> enable state of the hardware. This is possible
> with em
> >> hardware, but
> >> not as reliable with bge hardware, so the stock
> driver code
> >> does not
> >> have it implemented. I am running a
> filter-enabled bge
> >> driver in
> >> large-scale production, but I also have precise
> control
> >> over the
> >> hardware being used. I also have filter patches
> for the
> >> bce driver, but
> >> bce also tends to prefer MSI, so there isn't
> a
> >> compelling reason to
> >> continue to develop the patches.
> >>
> >>
> >> Scott
> >
> > Assuming same technique is used within an ithread as
> with a fast
> > interrupt, that is:
> >
> > filtered_foo(){
> > taskqueue_enqueue();
> > return FILTER_HANDLED;
> > }
>
> This will give you two context switches, one for the actual
> interrupt, and
> one for the taskqueue. It'll also encounter a spinlock
> in the taskqueue
> code, and a spinlock or two in the scheduler.
>
> >
> > ithread_foo(){
> > taskqueue_enqueue();
> > return;
> > }
> >
> > Is there any additional overhead/locking in the
> ithread method? I'm
> > looking to get better control over cpu distribution.
> >
>
> This will give you 3 context switches. First one will be
> for the actual
> interrupts. Second one will be for the ithread (recall
> that ithreads are
> full process contexts and are scheduled as such). Third
> one will be for
> the taskqueue. Along with the spinlocks for the scheduler
> and taskqueue
> code mentioned above, there will also be spinlocks to
> protect the APIC
> registers, as well as extra bus cycles to service the APIC.
>
> So, that's 2 trips through the scheduler, plus the
> associated spinlocks,
> plus the overhead of going through the APIC code, whereas
> the first method
> only goes through the scheduler once. Both will have a
> context switch to
> service the low-level interrupt. The second method will
> definitely have
> more context switches, and will almost certainly have
> higher overall
> service latency and CPU usage.
>
> Scott
Scott, I'm sure you're going to yell at me, but here I go anyway.
I set up a little task that basically does:
foo_task(){
while(1){
foo_doreceive();
pause("foo",1);
}
}
which wakes hz times per second in 7 and hz/2 times per second in
8. The same accounting issue exists for this case, as I have it bridging
400K pps and usage shows 0 most of the time. I've added some firewall
rules which should substantially increase the load, but still no usage.
If I really hammer it, like 600Kpps, it starts registering 30% usage,
with no ramp up in between. I suppose it could be just falling out of the
cache or something, but it doesn't seem realistic.
Is there some hack I can implement to make sure a task is
accounted for, or some other way to monitor its usage?
Barney
More information about the freebsd-current
mailing list