[PATCH 2/2] fork: plug a use after free of the returned process pointer

Wed Feb 3 08:05:19 UTC 2016

On Wed, Feb 03, 2016 at 02:04:13AM +0100, Mateusz Guzik wrote:
> On Tue, Feb 02, 2016 at 10:44:27PM +0100, Mateusz Guzik wrote:
> > On Tue, Feb 02, 2016 at 08:16:35PM +0200, Konstantin Belousov wrote:
> > > On Tue, Feb 02, 2016 at 06:56:52PM +0100, Mateusz Guzik wrote:
> > > > On Tue, Feb 02, 2016 at 03:23:22PM +0200, Konstantin Belousov wrote:
> > > > > On Tue, Feb 02, 2016 at 05:07:49AM +0100, Mateusz Guzik wrote:
> > > > > > +	flags = fr->fr_flags;
> > > > > Why not use fr->fr_flags directly ?  It is slightly more churn, but IMO
> > > > > it is worth it.
> > > > > 
> > > > 
> > > > I'm indiffernet on this one, can change it no problem.
> > > > 
> > > > > > +	/*
> > > > > > +	 * Hold the process so that it cannot exit after we make it runnable,
> > > > > > +	 * but before we wait for the debugger.
> > > > > Is this possible ?  The forked child must execute through fork_return(),
> > > > > and there we do ptracestop() before the child has a chance to ever return
> > > > > to usermode.
> > > > > 
> > > > > Do you mean a scenario where the debugger detaches before child executes
> > > > > fork_return() and TDP_STOPATFORK is cleared in advance ?
> > > > > 
> > > > 
> > > > The comment is somewhat misworded and I forgot to update it, how about
> > > > just stating we hold the process so that we can mark the thread runnable
> > > > and not have it disappear under we are done.
> > > This means that the reader has to guess too much, IMHO.
> > > 
> > > At least, add a note that despite fork_return() stops when the child
> > > is traced, it is not enough because ... .
> > > > 
> > > > While I have not tested this particular bug, prior to the patch the
> > > > following should possible: p2 is untraced and td2 is marked as runnable,
> > > > after which it exits and p2 is automatically reaped. If the code reaches
> > > > the TDB_STOPATFORK check after that, PROC_LOCK(p2) succeeds due to
> > > I.e. td2 is reused and the TDB_STOPATFORK is set by unrelated activity ?
> > > You reference the do_fork() code checking TDB_STOPATFORK, and not
> > > fork_return(), I guess.
> > > 
> > > 
> > > > processes being type stable. td2 dereference can cause no issues due to
> > > > threads being type stable as well. But the thread could have been resued
> > > > in a traced process, thus inducing cv_wait(&p2->p_dbgwait, ..) even
> > > > though td2 is not linked in p2 anymore and p2 is not even a valid
> > > > process, making curthread wait indefinitely since there is nobody to
> > > > wake it up.
> > > > 
> > > Well, if TDP_STOPATFORK bit is set, it has the same meaning due to type
> > > stability, and eventually the wake up would be performed.  It just the
> > > unintended sleep waiting for condvar which is problematic and which I
> > > agree with.
> > 
> > 
> > CPU0 is executing fork1. p2 is not traced.
> > 
> > CPU0				CPU1
> > p2 and td2 created
> > td2 is marked runnable
> > 				td2 is scheduled here
> > 				td2 does not have TDB_STOPATFORK set
> > 				td2 exits
> > 				p2 is autoreaped
> > 				td2's space is reused
> > 				td2 gets linked into p3
> > 				td2 gets TDB_STOPATFORK
> > PROC_LOCK(p2);
> > TDB_STOPATFORK test on td2
> > cv_wait(&p2->p_dbgwait, ..);
> > 
> > So at this point p2 has no linked threads and is free. td2 belongs to
> > p3 and p2 is waiting for a wakeup which can't happen.
> > 
> > Now that I look at it this may be broken in an additonal way, which is
> > not fixed by the patch: what if td2 spawns a new thread and thr_exits?
> > In this case testing td2 is still invalid.  Maybe I'm just getting
> > paranoid here, I don't have time to properly test this right now. In
> > worst case should be fixable well enough with FIRST_THREAD_IN_PROC.
> > 
> > How about the following comment around _PHOLD:
> > We are going to make the main thread runnable. It can quickly exit,
> > causing the process to be reaped and possibly reused, thus invalidating
> > our p2 pointer. Protect against this by holding the process, which
> > postpones the exit.
> > 
> > And if the suspicion gets confimed the following would be added:
> > diff --git a/sys/kern/kern_fork.c b/sys/kern/kern_fork.c
> > index d0c3837..2a076ed 100644
> > --- a/sys/kern/kern_fork.c
> > +++ b/sys/kern/kern_fork.c
> > @@ -773,6 +773,12 @@ do_fork(struct thread *td, struct fork_req *fr, struct proc *p2, struct thread *
> >  
> >         PROC_LOCK(p2);
> >         /*
> > +        * By the time we got here the thread could have created a new thread
> > +        * and exited. Reload the main thread to ensure we got the right
> > +        * pointer.
> > +        */
> > +       td2 = FIRST_THREAD_IN_PROC(p2);
> > +       /*
> >          * Wait until debugger is attached to child.
> >          */
> >         while ((td2->td_dbgflags & TDB_STOPATFORK) != 0)
> > 
> > 
> 
> To end the speculation and hackery I decided to reorganize the func a
> little bit instead. Namely it can be trivially modified to not drop the
> lock proc lock, which gets rid of all aforementioned races.

Here is a trivial change retaining doing knote_fork before waiting for
the debugger. Here the problem is worked around by making td2 runnable
after knote_fork is performed and without relocking p2 in-between.

diff --git a/sys/kern/kern_fork.c b/sys/kern/kern_fork.c
index d0c3837..366262f 100644
--- a/sys/kern/kern_fork.c
+++ b/sys/kern/kern_fork.c
@@ -715,11 +715,6 @@ do_fork(struct thread *td, struct fork_req *fr, struct proc *p2, struct thread *
 	if ((flags & RFMEM) == 0 && dtrace_fasttrap_fork)
 		dtrace_fasttrap_fork(p1, p2);
 #endif
-	/*
-	 * Hold the process so that it cannot exit after we make it runnable,
-	 * but before we wait for the debugger.
-	 */
-	_PHOLD(p2);
 	if ((p1->p_flag & (P_TRACED | P_FOLLOWFORK)) == (P_TRACED |
 	    P_FOLLOWFORK)) {
 		/*
@@ -756,6 +751,7 @@ do_fork(struct thread *td, struct fork_req *fr, struct proc *p2, struct thread *
 		fdrop(fp_procdesc, td);
 	}
 
+	PROC_LOCK(p2);
 	if ((flags & RFSTOPPED) == 0) {
 		/*
 		 * If RFSTOPPED not requested, make child runnable and
@@ -771,13 +767,11 @@ do_fork(struct thread *td, struct fork_req *fr, struct proc *p2, struct thread *
 		*fr->fr_procp = p2;
 	}
 
-	PROC_LOCK(p2);
 	/*
 	 * Wait until debugger is attached to child.
 	 */
 	while ((td2->td_dbgflags & TDB_STOPATFORK) != 0)
 		cv_wait(&p2->p_dbgwait, &p2->p_mtx);
-	_PRELE(p2);
 	racct_proc_fork_done(p2);
 	PROC_UNLOCK(p2);
 }

> 
> I got 2 variants. For brevity both patches are applied on top of the
> current patchset. When combined, the patch would be combined with second
> patch in the patchset.
> 
> Both currently and with the first patch below knote_fork can get a
> now-freed or even recycled pid. I find that odd. This variant only saves
> the pid and calls knote_fork. This is likely fine enough.
> 
> Just in case, the second variant adds a primitive -
> knlist_empty_lockless to perform a racy check to see if there are any
> knotes. If so, the process is held and released after knote_fork. Note
> that this is no more racy than current code with respect to spotting
> knotes. What does improve is the fact that the process is guaranteed to
> be around for during knote_fork, although I don't know how important
> this is.
> 
> With both variants we also save one lock/unlock round, and a second
> lock/unlock with of knotes for the second variant in the common case.
> 
> Variant 1:
> diff --git a/sys/kern/kern_fork.c b/sys/kern/kern_fork.c
> index d0c3837..fae4eaf 100644
> --- a/sys/kern/kern_fork.c
> +++ b/sys/kern/kern_fork.c
> @@ -378,7 +378,7 @@ do_fork(struct thread *td, struct fork_req *fr, struct proc *p2, struct thread *
>      struct vmspace *vm2, struct file *fp_procdesc)
>  {
>  	struct proc *p1, *pptr;
> -	int trypid;
> +	int p2_pid, trypid;
>  	struct filedesc *fd;
>  	struct filedesc_to_leader *fdtol;
>  	struct sigacts *newsigacts;
> @@ -715,11 +715,6 @@ do_fork(struct thread *td, struct fork_req *fr, struct proc *p2, struct thread *
>  	if ((flags & RFMEM) == 0 && dtrace_fasttrap_fork)
>  		dtrace_fasttrap_fork(p1, p2);
>  #endif
> -	/*
> -	 * Hold the process so that it cannot exit after we make it runnable,
> -	 * but before we wait for the debugger.
> -	 */
> -	_PHOLD(p2);
>  	if ((p1->p_flag & (P_TRACED | P_FOLLOWFORK)) == (P_TRACED |
>  	    P_FOLLOWFORK)) {
>  		/*
> @@ -737,7 +732,6 @@ do_fork(struct thread *td, struct fork_req *fr, struct proc *p2, struct thread *
>  		td->td_pflags |= TDP_RFPPWAIT;
>  		td->td_rfppwait_p = p2;
>  	}
> -	PROC_UNLOCK(p2);
>  
>  	/*
>  	 * Now can be swapped.
> @@ -745,16 +739,10 @@ do_fork(struct thread *td, struct fork_req *fr, struct proc *p2, struct thread *
>  	_PRELE(p1);
>  	PROC_UNLOCK(p1);
>  
> -	/*
> -	 * Tell any interested parties about the new process.
> -	 */
> -	knote_fork(&p1->p_klist, p2->p_pid);
>  	SDT_PROBE3(proc, , , create, p2, p1, flags);
>  
> -	if (flags & RFPROCDESC) {
> +	if (flags & RFPROCDESC)
>  		procdesc_finit(p2->p_procdesc, fp_procdesc);
> -		fdrop(fp_procdesc, td);
> -	}
>  
>  	if ((flags & RFSTOPPED) == 0) {
>  		/*
> @@ -771,15 +759,26 @@ do_fork(struct thread *td, struct fork_req *fr, struct proc *p2, struct thread *
>  		*fr->fr_procp = p2;
>  	}
>  
> -	PROC_LOCK(p2);
>  	/*
>  	 * Wait until debugger is attached to child.
>  	 */
>  	while ((td2->td_dbgflags & TDB_STOPATFORK) != 0)
>  		cv_wait(&p2->p_dbgwait, &p2->p_mtx);
> -	_PRELE(p2);
>  	racct_proc_fork_done(p2);
> +	/*
> +	 * The process can exit and be waited on after we drop the lock. Save
> +	 * the pid so that it can be used for knotes.
> +	 */
> +	p2_pid = p2->p_pid;
>  	PROC_UNLOCK(p2);
> +
> +	/*
> +	 * Tell any interested parties about the new process.
> +	 */
> +	knote_fork(&p1->p_klist, p2_pid);
> +
> +	if (flags & RFPROCDESC)
> +		fdrop(fp_procdesc, td);
>  }
>  
>  int
> ========================
> 
> Variant 2:
> diff --git a/sys/kern/kern_event.c b/sys/kern/kern_event.c
> index d41ac96..3610d8a 100644
> --- a/sys/kern/kern_event.c
> +++ b/sys/kern/kern_event.c
> @@ -2038,12 +2038,22 @@ knlist_remove_inevent(struct knlist *knl, struct knote *kn)
>  	    (kn->kn_status & KN_HASKQLOCK) == KN_HASKQLOCK);
>  }
>  
> +/*
> + * For when the caller accepts that the check is inherently racy.
> + */
> +int
> +knlist_empty_lockless(struct knlist *knl)
> +{
> +
> +	return SLIST_EMPTY(&knl->kl_list);
> +}
> +
>  int
>  knlist_empty(struct knlist *knl)
>  {
>  
>  	KNL_ASSERT_LOCKED(knl);
> -	return SLIST_EMPTY(&knl->kl_list);
> +	return knlist_empty_lockless(knl);
>  }
>  
>  static struct mtx	knlist_lock;
> diff --git a/sys/kern/kern_fork.c b/sys/kern/kern_fork.c
> index d0c3837..70490ef 100644
> --- a/sys/kern/kern_fork.c
> +++ b/sys/kern/kern_fork.c
> @@ -378,7 +378,7 @@ do_fork(struct thread *td, struct fork_req *fr, struct proc *p2, struct thread *
>      struct vmspace *vm2, struct file *fp_procdesc)
>  {
>  	struct proc *p1, *pptr;
> -	int trypid;
> +	int p2_held, trypid;
>  	struct filedesc *fd;
>  	struct filedesc_to_leader *fdtol;
>  	struct sigacts *newsigacts;
> @@ -387,6 +387,7 @@ do_fork(struct thread *td, struct fork_req *fr, struct proc *p2, struct thread *
>  	sx_assert(&proctree_lock, SX_SLOCKED);
>  	sx_assert(&allproc_lock, SX_XLOCKED);
>  
> +	p2_held = 0;
>  	p1 = td->td_proc;
>  	flags = fr->fr_flags;
>  
> @@ -715,11 +716,6 @@ do_fork(struct thread *td, struct fork_req *fr, struct proc *p2, struct thread *
>  	if ((flags & RFMEM) == 0 && dtrace_fasttrap_fork)
>  		dtrace_fasttrap_fork(p1, p2);
>  #endif
> -	/*
> -	 * Hold the process so that it cannot exit after we make it runnable,
> -	 * but before we wait for the debugger.
> -	 */
> -	_PHOLD(p2);
>  	if ((p1->p_flag & (P_TRACED | P_FOLLOWFORK)) == (P_TRACED |
>  	    P_FOLLOWFORK)) {
>  		/*
> @@ -737,7 +733,6 @@ do_fork(struct thread *td, struct fork_req *fr, struct proc *p2, struct thread *
>  		td->td_pflags |= TDP_RFPPWAIT;
>  		td->td_rfppwait_p = p2;
>  	}
> -	PROC_UNLOCK(p2);
>  
>  	/*
>  	 * Now can be swapped.
> @@ -745,16 +740,10 @@ do_fork(struct thread *td, struct fork_req *fr, struct proc *p2, struct thread *
>  	_PRELE(p1);
>  	PROC_UNLOCK(p1);
>  
> -	/*
> -	 * Tell any interested parties about the new process.
> -	 */
> -	knote_fork(&p1->p_klist, p2->p_pid);
>  	SDT_PROBE3(proc, , , create, p2, p1, flags);
>  
> -	if (flags & RFPROCDESC) {
> +	if (flags & RFPROCDESC)
>  		procdesc_finit(p2->p_procdesc, fp_procdesc);
> -		fdrop(fp_procdesc, td);
> -	}
>  
>  	if ((flags & RFSTOPPED) == 0) {
>  		/*
> @@ -771,15 +760,32 @@ do_fork(struct thread *td, struct fork_req *fr, struct proc *p2, struct thread *
>  		*fr->fr_procp = p2;
>  	}
>  
> -	PROC_LOCK(p2);
>  	/*
>  	 * Wait until debugger is attached to child.
>  	 */
>  	while ((td2->td_dbgflags & TDB_STOPATFORK) != 0)
>  		cv_wait(&p2->p_dbgwait, &p2->p_mtx);
> -	_PRELE(p2);
>  	racct_proc_fork_done(p2);
> +	if (!knlist_empty_lockless(&p1->p_klist)) {
> +		/*
> +		 * Hold the process so that it does not exit until we call
> +		 * knote_fork.
> +		 */
> +		_PHOLD(p2);
> +		p2_held = 1;
> +	}
>  	PROC_UNLOCK(p2);
> +
> +	if (p2_held) {
> +		/*
> +		 * Tell any interested parties about the new process.
> +		 */
> +		knote_fork(&p1->p_klist, p2->p_pid);
> +		PRELE(p2);
> +	}
> +
> +	if (flags & RFPROCDESC)
> +		fdrop(fp_procdesc, td);
>  }
>  
>  int
> diff --git a/sys/sys/event.h b/sys/sys/event.h
> index 0f13231..771b3bb 100644
> --- a/sys/sys/event.h
> +++ b/sys/sys/event.h
> @@ -254,6 +254,7 @@ extern void	knote_fork(struct knlist *list, int pid);
>  extern void	knlist_add(struct knlist *knl, struct knote *kn, int islocked);
>  extern void	knlist_remove(struct knlist *knl, struct knote *kn, int islocked);
>  extern void	knlist_remove_inevent(struct knlist *knl, struct knote *kn);
> +extern int	knlist_empty_lockless(struct knlist *knl);
>  extern int	knlist_empty(struct knlist *knl);
>  extern void	knlist_init(struct knlist *knl, void *lock,
>      void (*kl_lock)(void *), void (*kl_unlock)(void *),
> 
> -- 
> Mateusz Guzik <mjguzik gmail.com>

-- 
Mateusz Guzik <mjguzik gmail.com>