cvs commit: src/sys/kern kern_mutex.c
Bruce Evans
brde at optusnet.com.au
Fri Jun 15 20:46:45 UTC 2007
On Mon, 11 Jun 2007, John Baldwin wrote:
> On Friday 08 June 2007 10:20:19 pm Bruce Evans wrote:
>> On Fri, 8 Jun 2007, John Baldwin wrote:
>>
>>> On Friday 08 June 2007 10:50:15 am Bruce Evans wrote:
>>>> On Thu, 7 Jun 2007, John Baldwin wrote:
["this" is in vm_page_zero_idle() where the vm mutex is dropped]
>> I think this is as good a place to preempt as any. In fact, the code
>> can be simplified by preempting here and not in the main loop. The
>> ...
>
> Preemption happens when a thread is made runnable, i.e. usually when an
> interrupt comes in, but also when releasing locks or doing
> wakeup/cv_broadcast, etc. The only thing the idle thread does other than
> interrupts is release the lock.
I'm not sure what you mean in the last sentence. This is a special idle
thread, and it doesn't do interrupts.
> One side effect of using a critical section
> here is that we are potentially adding interrupt latency, so this may be a
> rather bad thing. :-/ Probably you are seeing the number of context switches
> go down because we are "batching" up switches. Say you get two interrupts
> during an iteration of the main loop for this process, previously you'd get 4
> context switches out of that: zeroidle -> first ithread -> zeroidle, and then
> later zeroidle -> second ithread -> zeroidle. Now you block those switches
> until the critical exit, at which point both interrupt threads are pending,
> so you do: zeroidle -> first ithread -> second ithread -> zeroidle, i.e. 3
> context switches. However, the cost is increased interrupt latency and that
> may explain why the "real" time went up. Lots of context switches is fairly
> lame, but if the box is idle, then I'm less worried about wasting time in a
> context switch (well, I'm concerned because it means we can't put the CPU to
> sleep since we are executing something still, but extra overhead in an idle
> thread doing a background task is not near as worrisome as overhead
> in "important" work like ithreads).
I think I want to batch up switches a little in general, and and only
a little batching occurs here. pmap_zero_page_idle() takes about 1uS
on my main test system (Turion X2 with relatively slow DDR2 memory
which can neverthless be zeroed at 4 or 5GB/S). An extra 1uS of
interrupt latency here and there won't make much difference. It's
less than the extra latency for 1 ATPIC access if not using the APIC.
Also, for the makeworld benchmark, the interrupt handler runs for about
2% of the time, and pagezero runs for about 1% of the time. These
threads just don't run long enough to have much contention.
I think the main cause of extra context switches that I saw in some
misconfigurations is switching while holding the vm mutex. The thread
switched to is quite likely to block on this mutex and switch back.
> As to why preemption doesn't work for SMP, a thread only knows to preempt if
> it makes a higher priority thread runnable. This happens in mtx_unlock when
> we wakeup a thread waiting on the lock, in wakeup, or when an interrupt
> thread is scheduled (the interrupted thread "sees" the ithread being
> scheduled). If another thread on another CPU makes a thread runnable, the
> thread on the first CPU has no idea unless the second CPU explicitly sends a
> message (i.e. IPI) to the first CPU asking it to yield instead.
I believe SCHED_ULE does the IPI.
> Specifically, suppose you have threads A, B, C and with priorities A < B < C.
> Suppose A is running on CPU 0, and C is running on CPU 1. If C does a wakeup
> that awakens B, C isn't going to preempt to B because C is more important
> (assume > means more important, even though priority values are opposite
> that, which is annoying). If A had awakened B it would have preempted
> though, so in theory C should look at the other CPUs, notice that A < B, and
> send an IPI to CPU 0 to ask it to preempt from A to B. One thing is that
> IPI_PREEMPT should set td_owepreempt if the target A is in a critical
> section, I haven't checked to see if we do that currently.
Would it be worth checking a preemption flag in mtx_unlock()? This
would bloat the macro a bit. However, we already have the check and the
bloat for spinlocks, at least on i386's, by checking in critical_exit()
via spinlock_exit().
Using critical sections should have the side effect of getting the flag
checked in critical_exit(). This doesn't seem to work (for SCHED_4BSD),
so there seems to be a problem setting the flag.
>> Next I will try moving the PREEMPTION code to where the vm mutex is dropped
>> naturally. I will try the following order:
>
> I like this idea a lot.
> Actually, I would keep the sched_runnable(), etc. as #ifndef PREEMPTION, the
> critical_exit() already does that check (modulo concerns above). Also, I
Testing shows critical_exit() doesn't seem to be doing the preemption. On
UP, depending on PREEMPTION makes little difference, but on 2-way SMP with
no voluntary yielding, pagezero is too active. The critical sections don't
seem to be doing much.
> originally wanted to not hold the critical sectino while zeroing the page to
> not impeded interrupts during that operation. I was actually trying to just
> avoid preempting while holding the lock. However, given my comments about
> how this harms interrupt latency, maybe this is a bad idea and we should just
> let priority propagation handle that for us. Moving the context switch is
> probably a good idea though.
The 1 uS extra latency on my main test machine wouldn't matter, but go back
to a 486 with 10MB/S memory and the latency would be a problem -- the 1uS
becomes 400uS, which is a lot even for a 486.
> the reason I wanted to avoid preempting while holding the lock is you can get
> this case:
>
> zeroidle -> some ithread -> some top-half thread in kernel which needs the
> vm page queue mtx -> zeroidle (with top-half thread's priority; until
> mtx_unlock) -> top-half thread in kernel -> zeroidle
>
> which can be many context switches. By not switching while holding the lock,
> one can reduce this to:
>
> zeroidle -> some ithread -> some top-half thread -> zeroidle
>
> but at the cost of adding latency to "some ithread" and "some top-half thread"
Maybe preemption should be inhibited a bit when any mutex is held.
Here is my current version. I got tired of using a dynamic enable for
the PREEMPTION ifdef code and removed all the conditionals after the
most recent test showed that the voluntary switch is still needed.
% Index: vm_zeroidle.c
% ===================================================================
% RCS file: /home/ncvs/src/sys/vm/vm_zeroidle.c,v
% retrieving revision 1.47
% diff -u -2 -r1.47 vm_zeroidle.c
% --- vm_zeroidle.c 5 Jun 2007 00:00:57 -0000 1.47
% +++ vm_zeroidle.c 15 Jun 2007 19:30:13 -0000
% @@ -111,5 +111,13 @@
% mtx_unlock(&vm_page_queue_free_mtx);
% pmap_zero_page_idle(m);
% + if (sched_runnable()) {
% + thread_lock(curthread);
% + critical_exit();
% + mi_switch(SW_VOL, NULL);
% + thread_unlock(curthread);
% + } else
% + critical_exit();
% mtx_lock(&vm_page_queue_free_mtx);
% + critical_enter();
% m->flags |= PG_ZERO;
% vm_pageq_enqueue(PQ_FREE + m->pc, m);
% @@ -141,18 +149,14 @@
%
% mtx_lock(&vm_page_queue_free_mtx);
% + critical_enter();
% for (;;) {
% if (vm_page_zero_check()) {
% vm_page_zero_idle();
% -#ifndef PREEMPTION
% - if (sched_runnable()) {
% - thread_lock(curthread);
% - mi_switch(SW_VOL, NULL);
% - thread_unlock(curthread);
% - }
% -#endif
% } else {
% wakeup_needed = TRUE;
% + critical_exit();
% msleep(&zero_state, &vm_page_queue_free_mtx, 0,
% "pgzero", hz * 300);
% + critical_enter();
% }
% }
Bruce
More information about the cvs-src
mailing list