Special schedulers, one CPU only kernel, one only userland

Luigi Rizzo rizzo at icir.org
Wed Aug 17 16:33:34 GMT 2005


On Wed, Aug 17, 2005 at 08:36:04AM -0400, John Baldwin wrote:
> On Tuesday 16 August 2005 10:13 pm, gnn at freebsd.org wrote:
> > At Tue, 16 Aug 2005 16:22:41 -0700,
> >
> > luigi wrote:
> > > ok just for the records, the issue i had in mind is the release/acquire
> > > the mutex around some code that might cause a deadlock, not for the
> > > mutex ops per se, but for the need to make sure that the data
> > > structure is consistent before releasing the lock, and rechecking
...
> > Correct me if I'm wrong but I suspect you're thinking of cases such
> > as:
> >
> > 		RT_UNLOCK(rt);		/* XXX workaround LOR */
> > 		gwrt = rtalloc1(gate, 1, 0);
> > 		RT_LOCK(rt);
> >
> > in the routing/networking code.  A quick, and by no means exhaustive,
> > check of CURRENT with cscope and Emacs looking for these turned up 3.

actually most network device drivers have a similar thing in
the receive interrupt processing. The  code below is from if_fxp.c
but many other drivers have that as well:

                        /*
                         * Drop locks before calling if_input() since it
                         * may re-enter fxp_start() in the netisr case.  
                         * This would result in a lock reversal.  Better  
                         * performance might be obtained by chaining all
                         * packets received, dropping the lock, and then 
                         * calling if_input() on each one.
                         */
                        FXP_UNLOCK(sc);
                        (*ifp->if_input)(ifp, m);
                        FXP_LOCK(sc);
			...

here (and everywhere else)  there is no check that the device is still
in a consistent state when reacquiring the lock.

How safe is this in the face of, say, device removal, i have no idea,
because i cannot find a place that comments what the sc->sc_mtx lock is
supposed to protect.

But in fact, reading fxp_detach() is not very encouraging....
and the leftover spl*() calls are not making it any easier to understand
the code.

Actually, if someone can point me to something that documents how
network locking (device driver and above) is designed, i'd be
very grateful.

	cheers
	luigi

> > It may still be something to look at, but perhaps mroe from a point of
> > view of cleaning up our APIs to not do this kind of jiggery pokery,
> > rather than inventing a new locking paradigm, which is fraught with
> > peril.
> >
> > This isn't exhaustive though, if others can show this kind of thing
> > being done a lot, or becoming a idiom in our code, then there is more
> > cause for concern.
> >
> > Just my 2 yen :-)
> 
> Agreed.  I'd rather that we fix our APIs and try to organize the workflow to 
> minimize the number of times that a lock has to be dropped and then 
> reacquired.
> 
> -- 
> John Baldwin <jhb at FreeBSD.org>  <><  http://www.FreeBSD.org/~jhb/
> "Power Users Use the Power to Serve"  =  http://www.FreeBSD.org


More information about the freebsd-arch mailing list