Patch to optimize "bare" critical sections
Robert Watson
rwatson at FreeBSD.org
Thu Nov 25 07:08:38 PST 2004
On Tue, 23 Nov 2004, John Baldwin wrote:
> On Tuesday 23 November 2004 03:00 pm, John Baldwin wrote:
> > Basically, I have a patch to divorce the interrupt disable/deferring to
> > only happen inside of spinlocks using a new spinlock_enter/exit() API
> > (where a spinlock_enter/exit includes a critical section as well) but that
> > plain critical sections won't have to do such a thing. I've tested it on
> > i386, alpha, and sparc64 already, and it has also been tested on arm. I'm
> > unable to get a cross-built powerpc kernel to link (linker dies with a
> > signal 6), but the compile did finish. I have cross-compiled ia64 and
> > amd64
> > successfully, but have not run tested due to ENOHARDWARE. So, I would
> > appreciate it if a few folks could try the patch out on ppc, ia64, and
> > amd64 to make sure it works ok. Thanks.
> >
> > http://www.FreeBSD.org/~jhb/spinlock.patch
>
> *cough* Ahem, http://www.FreeBSD.org/~jhb/patches/spinlock.patch
FYI, I'm seeing a fairly solid wedge occuring under stress with the i386
patch in place on a dual Xeon test box in the Netperf cluster. I thought
at first it was a property of the UMA optimizations I have that use the
critical sections, but it also happens with just the critical section
changes, so... :-)
The reproduction mode I'm using is to run the syscall_timing tool on the
box over a serial console repeatedly:
http://www.watson.org/~robert/freebsd/syscall_timing.c
In particular, I'm running 10,000 iterations of the socket create/free
test. Under normal circumstances it looks like this:
tiger-2# while (1)
while? ./syscall_timing 10000 socket | grep per
while? end
0.000006708 per/iteration
0.000006642 per/iteration
0.000006658 per/iteration
0.000006660 per/iteration
...
^C
When I get the wedge it does this:
tiger-2# while (1)
while? ./syscall_timing 10000 socket | grep per
while? end
0.000006735 per/iteration
0.000006772 per/iteration
0.000006721 per/iteration
0.000006744 per/iteration
...
0.000006716 per/iteration
0.000006710 per/iteration
0.000006745 per/ <-- hung
It could well be associated with poor timing involving a clock or serial
interrupt.
I haven't made much headway at investigating it yet, and it looks like
serial break is of no help, but will attempt to see what I can do this
afternoon. I suspect without NMI on the box in question it will be
dificult. I haven't yet tried with a UP kernel, however, only SMP. That
said, with the critical section optimization in place and moving UMA to
using critical sections rather than mutexes for the per-CPU cache on SMP,
I see a small but heathy performance improvement in the socket
create/destroy micro-benchmark:
x netperf-socket-smp
+ percpu-socket-smp
+--------------------------------------------------------------------------+
| + x |
| ++ x xxx|
|+ + ++++ + xxxxx|
| |____A____| |AM||
+--------------------------------------------------------------------------+
N Min Max Median Avg Stddev
x 10 6.64e-06 6.676e-06 6.666e-06 6.6601e-06 1.2359881e-08
+ 10 6.078e-06 6.236e-06 6.172e-06 6.165e-06 4.0734915e-08
Difference at 95.0% confidence
-4.951e-07 +/- 2.82825e-08
-7.43382% +/- 0.424655%
(Student's t, pooled s = 3.01007e-08)
Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
robert at fledge.watson.org Principal Research Scientist, McAfee Research
More information about the freebsd-amd64
mailing list