cvs commit: src/sys/kern kern_mutex.c

Thu Jun 7 06:24:06 UTC 2007

Bruce -
  Can you also say how many runs do you do and how much variance there
is between runs?

Thanks.
          -Kip

On 6/6/07, Bruce Evans <brde at optusnet.com.au> wrote:
> On Wed, 6 Jun 2007, Bruce Evans wrote:
>
> > On Tue, 5 Jun 2007, Jeff Roberson wrote:
>
> >> You should try with kern.sched.pick_pri = 0.  I have changed this to be the
> >> default recently.  This weakens the preemption and speeds up some
> >> workloads.
> >
> > I haven't tried a new SCHED_ULE kernel yet.
>
> Tried now.  In my makeworld benchmark, SCHED_ULE is now only 4% slower
> than SCHED_4BSD (after losing 2% in SCHED_4BSD) (down from about 7%
> slower).  The difference is still from CPUs idling too much.
>
> Best result ever (SCHED_4BSD, June 4 kernel, no PREEMPTION):
> ---
>        827.48 real      1309.26 user       186.86 sys
>     1332122  voluntary context switches
>     1535129  involuntary context switches
> pagezero time 6 seconds
> ---
>
> After thread lock changes (SCHED_4BSD, no PREEMPTION):
> ---
>        847.70 real      1309.83 user       169.39 sys
>     2933415  voluntary context switches
>     1501808  involuntary context switches
> pagezero time 30 seconds.
>
> Unlike what I wrote before, there is a scheduling bug that affects
> pagezero directly.  The bug from last month involving pagezero losing
> its priority of PRI_MAX_IDLE and running at priority PUSER is back.
> This bug seemed to be gone in the June 4 kernel, but actually only
> happens less there.  This bug seems to cost 0.5-1.0% real time.
> ---
>
> After thread lock changes (SCHED_4BSD, now with PREEMPTION):
> ---
>        843.34 real      1304.00 user       168.87 sys
>     1651011  voluntary context switches
>     1630988  involuntary context switches
> pagezero time 27 seconds
>
> The problem with the extra context switches is gone (these context switch
> counts are like the ones in old kernels with PREEMPTION).  This result is
> affected by pagezero getting its priority clobbered.  The best result for
> an old kernel with PREMPTION was about 840 seconds, before various
> optimizations reduced this to 827 seconds (-0+4 seconds).
> ---
>
> Old run with SCHED_ULE (Mar 18):
>        899.50 real      1311.00 user       187.47 sys
>     1566366  voluntary context switches
>     1959436  involuntary context switches
> pagezero time 19 seconds
> ---
>
> Today with SCHED_ULE:
> ---
>        883.65 real      1290.92 user       188.21 sys
>     1658109  voluntary context switches
>     1708148  involuntary context switches
> pagezero time 7 seconds.
> ---
>
> In all of these, the user + sys decomposition is very inaccurate, but the
> (user + sys + pagezero_time) total is fairly accurate.  It is 1500+-2 for
> SCHED_4BSD and 1500+-17 for SCHED_ULE (old ULE larger, current ULE smaller).
>
> SCHED_ULE now shows intereting behaviour for non-parallel kernel
> builds on a 2-way SMP machine.  It is now slightly faster than SCHED_4BSD
> for this, but still much slower for parallel kernel builds.  This might
> be because it likes to leave 1 CPU idle to wait to find a better CPU to
> run on, and this is actually an optimization when there is >= 1 CPU to
> spare:
>
> RELENG_4 kernel build on nfs, non-parallel make.
> Best ever with SCHED_ULE (~June 4 kernel):
>         62.55 real        55.30 user         3.65 sys
> Current with SCHED_ULE:
>         62.18 real        54.91 user         3.51 sys
>
> RELENG_4 kernel build on nfs, make -j4.
> Best ever for SCHED_ULE (~June 4 kernel):
>         32.00 real        56.98 user         3.90 sys
> Current with SCHED_ULE:
>         33.11 real        56.01 user         4.12 sys
> ULE has been about 1 second slower for this since at least last November.
> It presumably reduces user+sys time by running pagezero more.
>
> The slowdown is much larger for a build on ffs:
>
> Non-parallel results not shown (litte difference from above).
>
> RELENG_4 kernel build on ffs, make -j4.
> Best ever for SCHED_ULE (~June 4 kernel):
>         29.94 real        56.03 user         3.12 sys
> Current with SCHED_ULE:
>         32.63 real        55.13 user         3.53 sys
> Now 9% of the real time (= 18% of the cycles on one CPU = almost the
> sys sys overhead) is apparently wasted by leaving one CPU idle.  This
> benchmark is of course dominated by many instances of 2 gcc hogs which
> should be scheduled to run in parallel with no idle cycles.  (In all
> these kernel benchmarks, everything except disk writes is cached before
> starting.  In other makeworld benchmarks, everything is cached before
> starting on the nfs server, while on the client nothing is cached.)
>
> I don't have context switch counts or pagezero times for the kernel builds.
> stathz is 100 = hz.  Maybe SCHED_ULE doesn't like this.  hz = 100 is
> about 1% faster than hz = 1000 for the makeworld benchmark.
>
> Bruce
>