debugging frequent kernel panics on 8.2-RELEASE
Steven Hartland
killing at multiplay.co.uk
Sat Aug 20 20:23:42 UTC 2011
----- Original Message -----
From: "Steven Hartland" <killing at multiplay.co.uk>
> Looking through the code I believe I may have noticed a scenario which could
> trigger the problem.
>
> Given the following code:-
>
> static void
> prison_deref(struct prison *pr, int flags)
> {
> struct prison *ppr, *tpr;
> int vfslocked;
>
> if (!(flags & PD_LOCKED))
> mtx_lock(&pr->pr_mtx);
> /* Decrement the user references in a separate loop. */
> if (flags & PD_DEUREF) {
> for (tpr = pr;; tpr = tpr->pr_parent) {
> if (tpr != pr)
> mtx_lock(&tpr->pr_mtx);
> if (--tpr->pr_uref > 0)
> break;
> KASSERT(tpr != &prison0, ("prison0 pr_uref=0"));
> mtx_unlock(&tpr->pr_mtx);
> }
> /* Done if there were only user references to remove. */
> if (!(flags & PD_DEREF)) {
> mtx_unlock(&tpr->pr_mtx);
> if (flags & PD_LIST_SLOCKED)
> sx_sunlock(&allprison_lock);
> else if (flags & PD_LIST_XLOCKED)
> sx_xunlock(&allprison_lock);
> return;
> }
> if (tpr != pr) {
> mtx_unlock(&tpr->pr_mtx);
> mtx_lock(&pr->pr_mtx);
> }
> }
>
> If you take a scenario of a simple one level prison setup running a single process
> where a prison has just been stopped.
>
> In the above code pr_uref of the processes prison is decremented. As this is the
> last process then pr_uref will hit 0 and the loop continues instead of breaking
> early.
>
> Now at the end of the loop iteration the mtx is unlocked so other process can
> now manipulate the jail, this is where I think the problem may be.
>
> If we now have another process come in and attach to the jail but then instantly
> exit, this process may allow another kernel thread to hit this same bit of code
> and so two process for the same prison get into the section which decrements
> prison0's pr_uref, instead of only one.
>
> In essence I think we can get the following flow where 1# = process1
> and 2# = process2
> 1#1. prison1.pr_uref = 1 (single process jail)
> 1#2. prison_deref( prison1,...
> 1#3. prison1.pr_uref-- (prison1.pr_uref = 0)
> 1#3. prison1.mtx_unlock <-- this now allows others to change prison1.pr_uref
> 1#3. prison0.pr_uref--
> 2#1. process1.attach( prison1 ) (prison1.pr_uref = 1)
> 2#2. process1.exit
> 2#3. prison_deref( prison1,...
> 2#4. prison1.pr_uref-- (prison1.pr_uref = 0)
> 2#5. prison1.mtx_unlock <-- this now allows others to change prison1.pr_uref
> 2#5. prison0.pr_uref-- (prison1.pr_ref has now been decremented twice by prison1)
>
> It seems like the action on the parent prison to decrement the pr_uref is
> happening too early, while the jail can still be used and without the lock on
> the child jails mtx, so causing a race condition.
>
> I think the fix is to the move the decrement of parent prison pr_uref's down
> so it only takes place if the jail is "really" being removed. Either that or
> to change the locking semantics so that once the lock is aquired in this
> prison_deref its not unlocked until the function completes.
>
> What do people think?
After reviewing the changes to prison_deref in commit which added hierarchical
jails, the removal of the lock by the inital loop on the passed in prison may
be unintentional.
http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/kern/kern_jail.c.diff?r1=1.101;r2=1.102;f=h
If so the following may be all that's needed to fix this issue:-
diff -u sys/kern/kern_jail.c.orig sys/kern/kern_jail.c
--- sys/kern/kern_jail.c.orig 2011-08-20 21:17:14.856618854 +0100
+++ sys/kern/kern_jail.c 2011-08-20 21:18:35.307201425 +0100
@@ -2455,7 +2455,8 @@
if (--tpr->pr_uref > 0)
break;
KASSERT(tpr != &prison0, ("prison0 pr_uref=0"));
- mtx_unlock(&tpr->pr_mtx);
+ if (tpr != pr)
+ mtx_unlock(&tpr->pr_mtx);
}
/* Done if there were only user references to remove. */
if (!(flags & PD_DEREF)) {
Regards
Steve
================================================
This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it.
In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337
or return the E.mail to postmaster at multiplay.co.uk.
More information about the freebsd-stable
mailing list