cvs commit: src/sys/i386/i386 pmap.c
Stephan Uphoff
ups at tree.com
Tue Nov 9 11:12:02 PST 2004
On Tue, 2004-11-09 at 08:03, Robert Watson wrote:
> On Tue, 9 Nov 2004, Robert Watson wrote:
>
> > > I've tried changing the store_rel() to just do a simple store since writes are
> > > ordered on x86, but benchmarks on SMP showed that it actually hurt. However,
> > > it would probably be good to at least do that for UP. The current patch to
> > > do it for all kernels is:
>
> Interestingly, I've now run through some more "macro" benchmarks. I saw a
> couple of percent improvement on UP from the change, but indeed, I saw a
> slight decrease in performance for the rapid packet send benchmark on SMP.
>
> So I guess my recommendation is to get this in the tree for UP, and see if
> we can figure out why it's having the slow-down effect on SMP.
We are probably talking cache line effects here.
My guess is that we should:
1) Make sure that important spin mutexes are alone in a cache line.
2) Take care not to dirty the cache line unnecessarily.
I think for 2 we need to change the spin mutex slightly (for SMP) to
never call LOCK cmpxchgl before a simple load operation finds
m->mtx_lock == MTX_UNOWNED since LOCK cmpxchgl always seems to dirty the
cache line.
I have a dual Xeon (p4) where I can run some tests. Please let me know
if there are any tests that you can recommend - I don't want to reinvent
the wheel here.
Interestingly enough the linux spin locks implementation is mentioning
some PPRO errata that seem to require a locked operation.
Guess that means we should take a look at the errata of all SMP able
processors out there :-(
Intel also recommends a locked operation (or SFENCE) for future
processors.
Guess this means either non optimal code, lots of compile options or
self modifying code.
Stephan
More information about the cvs-all
mailing list