rwlocks: poor performance with adaptive spinning
Jeff Roberson
jroberson at chesapeake.net
Mon Sep 24 13:54:27 PDT 2007
On Mon, 24 Sep 2007, John Baldwin wrote:
> On Saturday 22 September 2007 10:32:06 pm Attilio Rao wrote:
>> Recently several people have reported problems of starvation with rwlocks.
>> In particular, users which tried to use rwlock on big SMP environment
>> (16+ CPUs) found them rather subjected to poor performances and to
>> starvation of waiters.
>>
>> Inspecting the code, something strange about adaptive spinning popped
>> up: basically, for rwlocks, adaptive spinning stubs seem to be
>> customed too down in the decisioning-loop.
>> The desposition of the stub will let the thread that would adaptively
>> spin, to set the respecitve (both read or write) waiters flag on,
>> which means that the owner of the lock will go down in the hard path
>> of locking functions and will performe a full wakeup even if the
>> waiters queues can result empty. This is a big penalty for adaptive
>> spinning which can make it completely useless.
>> In addiction to this, adaptive spinning only runs in the turnstile
>> spinlock path which is not ideal.
>> This patch ports the approach alredy used for adaptive spinning in sx
>> locks to rwlocks:
>> http://users.gufi.org/~rookie/works/patches/kern_rwlock.diff
>>
>> In sx it is unlikely to see big benefits because they are held for too
>> long times, but for rwlocks situation is rather different.
>> I would like to see if people can do benchmarks with this patch (maybe
>> in private environments?) as I'm not able to do them in short times.
>>
>> Adaptive spinning in rwlocks can be improved further with other tricks
>> (like adding a backoff counter, for example, or trying to spin with
>> the lock held in read mode too), but we first should be sure to start
>> with a solid base.
>
> I did this for mutexes and rwlocks over a year ago and Kris found it was
> slower in benchmarks. www.freebsd.org/~jhb/patches/lock_adapt.patch is the
> last thing I sent kris@ to test (it only has the mutex changes). This might
> be more optimal post-thread_lock since thread_lock seems to have heavily
> pessimized adaptive spinning because it now enqueues the thread and then
> dequeues it again before doing the adaptive spin. I liked the approach
> orginially because it simplifies the code a lot. A separate issue is that
> writers don't spin at all if a reader holds the lock, and I think one thing
> to test for that would be an adaptive spin with a static timeout.
We don't enqueue the thread until the same place. We just acquire an
extra spinlock. The thread is not enqueued until turnstile_wait() as
before.
Jeff
>
> --
> John Baldwin
> _______________________________________________
> freebsd-arch at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-arch
> To unsubscribe, send any mail to "freebsd-arch-unsubscribe at freebsd.org"
>
More information about the freebsd-arch
mailing list