rwlocks: poor performance with adaptive spinning

Mon Sep 24 09:29:34 PDT 2007

On Saturday 22 September 2007 10:32:06 pm Attilio Rao wrote:
> Recently several people have reported problems of starvation with rwlocks.
> In particular, users which tried to use rwlock on big SMP environment
> (16+ CPUs) found them rather subjected to poor performances and to
> starvation of waiters.
> 
> Inspecting the code, something strange about adaptive spinning popped
> up: basically, for rwlocks, adaptive spinning stubs seem to be
> customed too down in the decisioning-loop.
> The desposition of the stub will let the thread that would adaptively
> spin, to set the respecitve (both read or write) waiters flag on,
> which means that the owner of the lock will go down in the hard path
> of locking functions and will performe a full wakeup even if the
> waiters queues can result empty. This is a big penalty for adaptive
> spinning which can make it completely useless.
> In addiction to this, adaptive spinning only runs in the turnstile
> spinlock path which is not ideal.
> This patch ports the approach alredy used for adaptive spinning in sx
> locks to rwlocks:
> http://users.gufi.org/~rookie/works/patches/kern_rwlock.diff
> 
> In sx it is unlikely to see big benefits because they are held for too
> long times, but for rwlocks situation is rather different.
> I would like to see if people can do benchmarks with this patch (maybe
> in private environments?) as I'm not able to do them in short times.
> 
> Adaptive spinning in rwlocks can be improved further with other tricks
> (like adding a backoff counter, for example, or trying to spin with
> the lock held in read mode too), but we first should be sure to start
> with a solid base.

I did this for mutexes and rwlocks over a year ago and Kris found it was 
slower in benchmarks.  www.freebsd.org/~jhb/patches/lock_adapt.patch is the 
last thing I sent kris@ to test (it only has the mutex changes).  This might 
be more optimal post-thread_lock since thread_lock seems to have heavily 
pessimized adaptive spinning because it now enqueues the thread and then 
dequeues it again before doing the adaptive spin.  I liked the approach 
orginially because it simplifies the code a lot.  A separate issue is that 
writers don't spin at all if a reader holds the lock, and I think one thing 
to test for that would be an adaptive spin with a static timeout.

-- 
John Baldwin