My wish list for 6.1

Sat Dec 31 08:25:18 PST 2005

Robert Watson wrote on Sat, Dec 31, 2005 at 07:12:23AM +0000: 
> 
> On Fri, 16 Dec 2005, Avleen Vig wrote:
> 
> > On Fri, Dec 16, 2005 at 10:40:22AM -0500, Martin Cracauer wrote:
> >>> 2.  SMP kernels for install.  Right now we only install a UP kernel, for
> >>> performance reasons.  We should be able to package both a UP and SMP
> >>> kernel into the release bits, and have sysinstall install both.  It
> >>> should also select the correct one for the target system and make that
> >>> the default on boot.
> >>
> >> If people are concerned about performance, I benchmarked a 6-beta kernel 
> >> SMP versus UP on a socket 939 Opteron.
> >
> > If those results are accurate, there's no real reason not to just use an SMP 
> > kernel on default install?
> 
> This is an old thread that I'm just catching up on, but I figured I'd chime in 
> anyway: you have to be really careful benchmarking across CPU types and 
> configurations, as the performance characteristics of important insturctions 
> differ a lot across hardware variations.  For example, the performance of 
> atomic operations, used to synchronize between CPUs, varies significantly by 
> CP, bus configuration, etc.  On modern opteron hardware, the performance of 
> inter-CPU synchronization instructions is blindingly fast.  On modern Xeon P4 
> hardware, it is incredibly slow. 

Well, my runs included P4s and P4-based Xeons, and hyperthreading,
too.

The core of the problem here is that while my parallel benchmarks are
partly system-call exercising, I use apache over localhost and
zero-spaced files to get the disk and network out of the equitation.
I think I have a solid framework in place to run parallel benchmarks
and see the tradeoffs involved, but I need to fill it with activity
that exercises what we want to see.

Still, I bet that my measurements are good enough to label the SMP
kernel "defaultable" for FreeBSD installations, from a performance
standpoint.  After all, I *do* test parallel activity, CPU-intensive
and systemcall-intensive and various mixes thereof.

Remember that those people who do a lot of parallel activity and hence
would suffer from the additional locks in the SMP kernel are very
likely to have a SMP system, dual-cores or at least hyperthreading in
first place.  On the other hand, people who use very low-end hardware
to do demanding tasks are very likely to build their own kernel
anyway.

> Software optimized for the Opteron will 
> often perform much slower on Xeon P4 hardware as a result.  P3 hardware tends 
> to behave a lot more like Opteron in terms of speed of insturctions relating 
> to disabling interrupts, where on P4 Xeon they are proprtionally much slower. 
> The critical section optimizations made by John Baldwin, and the movement to 
> critical sections in UMA and kernel malloc that I made, made a big performance 
> difference on Xeon P4 hardware, but relatively little difference on
> Opteron. 

One thing I noticed is that anything P4-based is very sensitive to
spinlocks being placed on the same cache line as the data it protects.
Putting a lock into a struct without cache-line crossing padding means
doom for the P4-based/netburst CPUs (I'm sure it's not a good thing
for Opterons either but they don't seem to mind that much).

Martin
-- 
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Martin Cracauer <cracauer at cons.org>   http://www.cons.org/cracauer/
FreeBSD - where you want to go, today.      http://www.freebsd.org/