cvs commit: src/sys/vm uma_core.c uma_int.h

Sat Apr 30 01:25:07 PDT 2005

On Fri, 29 Apr 2005, Jeff Roberson wrote:

>>   Modify UMA to use critical sections to protect per-CPU caches, rather than
>>   mutexes, which offers lower overhead on both UP and SMP.  When allocating
>>   from or freeing to the per-cpu cache, without INVARIANTS enabled, we now
>>   no longer perform any mutex operations, which offers a 1%-3% performance
>>   improvement in a variety of micro-benchmarks.  We rely on critical
>>   sections to prevent (a) preemption resulting in reentrant access to UMA on
>>   a single CPU, and (b) migration of the thread during access.  In the event
>>   we need to go back to the zone for a new bucket, we release the critical
>>   section to acquire the global zone mutex, and must re-acquire the critical
>>   section and re-evaluate which cache we are accessing in case migration has
>>   occured, or circumstances have changed in the current cache.
>
> Excellent work.  thanks.  You could also use sched_pin() in uma_zalloc 
> to prevent migration so you can be certain that you're still accessing 
> the same cache.  You wont be able to trust the state of that cache. 
> I'm not sure whether or not this would make a difference, but it could 
> be beneificial if we decide to do per-cpu slab lists for locality on 
> NUMA machines.

In my first pass, I did use sched_pin, but I found that since I had to 
revalidate the state of the cache anyway in the event the critical section 
was released so we could acquire a mutex, pinning added complexity without 
immediate measurable benefit.  I'm also a bit tepid about over-pinning, as 
that prevents the scheduler from balancing the load well, such as 
migrating the thread in the event a higher priority thread wants to run on 
the current CPU (such as a pinned ithread that preempts).  I don't have 
any measurement to suggest to what extent this occurs in practice, 
currently, but think these are issues we should explore.  A case like that 
isn't quite priority inversion, since the tread with the right priority 
will take precedence, but might result in less effective utilization if 
pinneds thread uses quite a bit of CPU.

Robert N M Watson