adaptive rwlock deadlock

Philippe Jalaber pjalaber at gmail.com
Tue Jul 7 10:10:21 UTC 2015


Hi,

I am facing a strange problem using the network stack and adaptive rwlocks
running Freebsd 9.3.
Basically I can reproduce the problem with 3 threads:

1) thread 1 has taken the rwlock of structure inpcb in exclusive mode in
tcp_input.c. This thread also runs my own code and repeatedly takes a
rwlock (called g_rwlock) in shared mode and releases it, until a shared
object is marked not "busy" any more:

rwlock(inp_lock);
....
do { // thread is active waiting in the loop
    rlock(g_rwlock);
    o = find();
    if ( o == NULL )
        break;
    busy = o.busy;
    if (o != NULL && busy)
        runlock(g_rwlock);
} while ( busy );

if ( o != NULL )
{
    // do something with o
    ....
}
runlock(g_rwlock);
....

2) thread 2 wants to set the shared object as "ready". So it tries to take
g_rwlock in exclusive mode and is blocked in _rw_wlock_hard at kern_rwlock.c:815
"turnstile_wait(ts, rw_owner(rw), TS_EXCLUSIVE_QUEUE)" because thread 1 has
already taken it in shared mode:

wlock(g_rwlock);
o = find();
if ( o != NULL )
    o.busy = 1;
wunlock(g_rwlock);

// o is busy so work on it without any lock
....

wlock(g_rwlock); // thread is blocked here
o.busy = 0;
maybe_delete(o);
wunlock(g_rwlock);

3) thread 3 spins on the same inpcb rwlock than thread 1 in
_rw_wlock_hard at kern_rwlock.c:721 "while ((struct
thread*)RW_OWNER(rw->rw_lock) == owner && TD_IS_RUNNING(owner)) "


My target machine has two cpus.
Thread 1 is pinned to cpu 0.
Thread 2 and Thread 3 are pinned to cpu 1.
Thread 1 and Thread 2 have a priority of 28.
Thread 3 has a priority of 127

Now what seems to happen is that when thread 1 calls runlock(g_rwlock), it
calls turnstile_broadcast at kern_rwlock.c:650, but thread 2 never regains
control because thread 3 is spinning on the inpcb rwlock. Also the
condition TD_IS_RUNNING(owner) is always true because thread 1 is active
waiting in a loop. So the 3 threads deadlock.
Note that if I compile the kernel without adaptive rwlocks it works without
any problem.
A workaround is to add a call to "sched_relinquish(curthread)" in thread 1
in the loop just after the call to runlock.

I am also wondering about the code in _rw_runlock after
"turnstile_broadcast(ts, queue)". Isn't the flag RW_LOCK_WRITE_WAITERS
definitely lost if the other thread which is blocked in turnstile_wait
never regains control ?

Thank you for your time,
Regards,
Philippe


More information about the freebsd-hackers mailing list