duplicate read/write locks in net/pfil.c and netinet/ip_fw2.c

Max Laier max at love2party.net
Thu Aug 18 01:32:42 GMT 2005


On Thursday 18 August 2005 02:02, Luigi Rizzo wrote:
> On Wed, Aug 17, 2005 at 04:35:19AM +0200, Max Laier wrote:
> > On Wednesday 17 August 2005 02:05, Luigi Rizzo wrote:
> > > [apologies for the cross post but it belongs both to arch and net.]
> > >
> > > I notice that net/pfil.c and netinet/ip_fw2.c have two copies of
> > > aisimilar but slightly different implementation of
> > > multiple-reader/single-writer locks, which brings up the question(s):
> > >
> > > 1. should we rather put this code in the generic kernel code so that
> > > other subsystems could make use of it ? E.g. the routing table is
> > > certainly a candidate,
> >
> > I have asked this several time on -arch and IRC, but never found anyone
> > willing to pursue it.  However, the problem is ...
> >
> > > and especially
> > >
> > > 2. should we implement it right ?
> > >
> > >    Both implementations are subject to starvation for the writers
> > >    (which is indeed a problem here, because we might want to modify
> > >    a ruleset and be prevented from doing it because of incoming traffic
> > >    that keeps readers active).
> > >    Also the PFIL_TRY_WLOCK will in fact be blocking if a writer
> > >    is already in - i have no idea how problematic is this in the
> > >    way it is actually used.
> >
> >... really this.  I didn't find a clean way out of the starvation issue. 
> > What I do for pfil is that I set a flag and simply stop serving[2] shared
> > requests once a writer waits for the lock.  If a writer can't sleep[1]
> > then we return EBUSY and don't.  However, for pfil it's almost ever safe
> > to assume that a write may sleep (as it is for most instances of this
> > kind of sx-lock where you have BIGNUMxreads:1xwrite).
>
> could you guys look at the following code and see if it makes sense,
> or tell me where i am wrong ?
>
> It should solve the starvation and blocking trylock problems,
> because the active reader does not hold the mutex in the critical
> section anymore. The lock could well be a spinlock.

The reader doesn't hold the lock over the critical section with the current 
implementation either.  The write holds the lock over the critical section, 
which was a conscious decision.  For some cases it does make sense to get rid 
of this, however.  See inline comments why this doesn't work for pfil.

> 	cheers
> 	luigi
>
> /*
>  * Implementation of multiple reader-single writer that prevents
> starvation. * Luigi Rizzo 2005.08.19
>  *
>  * The mutex m only protects accesses to the struct rwlock.
>  * We can have the following states:
>  * IDLE: readers = 0, writers = 0, br = 0;
>  *      any request will be granted immediately.
>  *
>  * READ: readers > 0, writers = 0, br = 0. Read in progress.
>  *      Grant read requests immediately, queue write requests and
>  *      move to READ1.
>  *      When last reader terminates, move to IDLE.
>  *
>  * READ1: readers > 0, writers > 0, br >= 0.
>  *      Read in progress, but writers are queued.
>  *      Queue read and write requests to qr and wr, respectively.
>  *      When the last reader terminates, wakeup the next queued writer
>  *      and move to WRITE
>  *
>  * WRITE: readers = 0, writers > 0, br >= 0.
>  *      Write in progress, possibly queued readers/writers.
>  *      Queue read and write requests to qr and wr, respectively.
>  *      When the writer terminates, wake up all readers if any,
>  *      otherwise wake up the next writer if any.
>  *      Move to READ, READ1, IDLE accordingly.
>  */
>
> struct rwlock {
>         mtx     m;      /* protects access to the rwlock */
>         int     readers;        /* active readers */
>         int     br;     /* blocked readers */
>         int     writers;        /* active + blocked writers */
>         cv      qr;     /* queued readers */
>         cv      qw;     /* queued writers */
> }
>
> int
> RLOCK(struct rwlock *rwl, int try)
> {
>         if (!try)
>                 mtx_lock(&rwl->m);
>         else if (!mtx_trylock(&rwl->m))
>                 return EBUSY;
>         if (rwl->writers == 0)  /* no writer, pass */
>                 rwl->readers++;
>         else {
>                 rwl->br++;
>                 cv_wait(&rwl->qr, &rwl->m);
                  ^^^^^^^

That we can't do.  That's exactly the thing the existing sx(9) implementation 
does and where it breaks.  The problem is that cv_wait() is an implicit sleep 
which breaks when we try to RLOCK() with other mutex already acquired.  
Moreover will this break for recursive reads e.g.:

Thread 1: RLOCK() ...      RLOCK() -> cv_wait ...
Thread 2:          WLOCK() -> cv_wait ...

This is exactly what pfil_hooks must be able to do as the packet filter may 
want to call back to the stack in order to send rejects etc.

It would be an idea to use cv_wait() depending on the value in the try 
argument and return EBUSY for that as well.

This is a textbook implementation, that sure will work (given the above 
change).  The problem is, that we still have to invest 4 mutex operations for 
every access.  The current implementation has the same basic problem (though 
it only uses 2 mutex operations for the WLOCK/UNLOCK).  Ideally the RLOCK/ 
UNLOCK should be free unless there is a writer waiting for the lock.  In 
order to do this I am thinking of a "spl-like" value in the PCPU section.  A 
write would IPI other CPUs with a request to "raise" the bar once all have 
confirmed it would do the write and IPI again.  This gives a serious 
disadvantage to writes, but would allow us to block on failed read requests 
instead of erroring out or sleeping.  Also, an uncongested read would be free 
when compared to the existing solution.

I have to read some more to see if this actually works.  Comments appreciated!

>         }
>         mtx_unlock(&rwl->m);
>         return 0;
> }
>
> int
> WLOCK(struct rwlock *rwl, int try)
> {
>         if (!try)
>                 mtx_lock(&rwl->m);
>         else if (!mtx_trylock(&rwl->m))
>                 return EBUSY;
>         rwl->writers++;
>         if (rwl->readers > 0)   /* have readers, must wait */
>                 cv_wait(&rwl->qw, &rwl->m);
>         mtx_unlock(&rwl->m);
>         return 0;
> }
>
> void
> RUNLOCK(struct rwlock *rwl)
> {
>         mtx_lock(&rwl->m);
>         rwl->readers--;
>         if (rwl->readers == 0 && rwl->writers > 0)
>                 cv_signal(&rwl->qw);
>         mtx_unlock(&rwl->m);
> }
>
> void
> WUNLOCK(struct rwlock *rwl)
> {
>         mtx_lock(&rwl->m);
>         rwl->writers--;
>         if (rwl->br > 0) {      /* priority to readers */
>                 rwl->readers = rwl->br;
>                 rwl->br = 0;
>                 cv_broadcast(&rwl->qr);
>         } else if (rwl->writers > 0)
>                 cv_signal(&rwl->qw);
>         mtx_unlock(&rwl->m);
> }

-- 
/"\  Best regards,                      | mlaier at freebsd.org
\ /  Max Laier                          | ICQ #67774661
 X   http://pf4freebsd.love2party.net/  | mlaier at EFnet
/ \  ASCII Ribbon Campaign              | Against HTML Mail and News
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 187 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-net/attachments/20050818/03b2d647/attachment.bin


More information about the freebsd-net mailing list