[PATCH 2/2] fork: plug a use after free of the returned process pointer

Wed Feb 3 14:13:44 UTC 2016

On Wed, Feb 03, 2016 at 09:05:15AM +0100, Mateusz Guzik wrote:
> On Wed, Feb 03, 2016 at 02:04:13AM +0100, Mateusz Guzik wrote:
> > On Tue, Feb 02, 2016 at 10:44:27PM +0100, Mateusz Guzik wrote:
> > > On Tue, Feb 02, 2016 at 08:16:35PM +0200, Konstantin Belousov wrote:
> > > > On Tue, Feb 02, 2016 at 06:56:52PM +0100, Mateusz Guzik wrote:
> > > > > On Tue, Feb 02, 2016 at 03:23:22PM +0200, Konstantin Belousov wrote:
> > > > > > On Tue, Feb 02, 2016 at 05:07:49AM +0100, Mateusz Guzik wrote:
> > > > > > > +	flags = fr->fr_flags;
> > > > > > Why not use fr->fr_flags directly ?  It is slightly more churn, but IMO
> > > > > > it is worth it.
> > > > > > 
> > > > > 
> > > > > I'm indiffernet on this one, can change it no problem.
> > > > > 
> > > > > > > +	/*
> > > > > > > +	 * Hold the process so that it cannot exit after we make it runnable,
> > > > > > > +	 * but before we wait for the debugger.
> > > > > > Is this possible ?  The forked child must execute through fork_return(),
> > > > > > and there we do ptracestop() before the child has a chance to ever return
> > > > > > to usermode.
> > > > > > 
> > > > > > Do you mean a scenario where the debugger detaches before child executes
> > > > > > fork_return() and TDP_STOPATFORK is cleared in advance ?
> > > > > > 
> > > > > 
> > > > > The comment is somewhat misworded and I forgot to update it, how about
> > > > > just stating we hold the process so that we can mark the thread runnable
> > > > > and not have it disappear under we are done.
> > > > This means that the reader has to guess too much, IMHO.
> > > > 
> > > > At least, add a note that despite fork_return() stops when the child
> > > > is traced, it is not enough because ... .
> > > > > 
> > > > > While I have not tested this particular bug, prior to the patch the
> > > > > following should possible: p2 is untraced and td2 is marked as runnable,
> > > > > after which it exits and p2 is automatically reaped. If the code reaches
> > > > > the TDB_STOPATFORK check after that, PROC_LOCK(p2) succeeds due to
> > > > I.e. td2 is reused and the TDB_STOPATFORK is set by unrelated activity ?
> > > > You reference the do_fork() code checking TDB_STOPATFORK, and not
> > > > fork_return(), I guess.
> > > > 
> > > > 
> > > > > processes being type stable. td2 dereference can cause no issues due to
> > > > > threads being type stable as well. But the thread could have been resued
> > > > > in a traced process, thus inducing cv_wait(&p2->p_dbgwait, ..) even
> > > > > though td2 is not linked in p2 anymore and p2 is not even a valid
> > > > > process, making curthread wait indefinitely since there is nobody to
> > > > > wake it up.
> > > > > 
> > > > Well, if TDP_STOPATFORK bit is set, it has the same meaning due to type
> > > > stability, and eventually the wake up would be performed.  It just the
> > > > unintended sleep waiting for condvar which is problematic and which I
> > > > agree with.
> > > 
> > > 
> > > CPU0 is executing fork1. p2 is not traced.
> > > 
> > > CPU0				CPU1
> > > p2 and td2 created
> > > td2 is marked runnable
> > > 				td2 is scheduled here
> > > 				td2 does not have TDB_STOPATFORK set
> > > 				td2 exits
> > > 				p2 is autoreaped
> > > 				td2's space is reused
> > > 				td2 gets linked into p3
> > > 				td2 gets TDB_STOPATFORK
> > > PROC_LOCK(p2);
> > > TDB_STOPATFORK test on td2
> > > cv_wait(&p2->p_dbgwait, ..);
> > > 
> > > So at this point p2 has no linked threads and is free. td2 belongs to
> > > p3 and p2 is waiting for a wakeup which can't happen.
I am convinced about this.  I thought that the fork_return() guarantee
that the child cannot exit if TDB_STOPATFORK is set is enough, but the
issue is other way around.

> > > 
> > > Now that I look at it this may be broken in an additonal way, which is
> > > not fixed by the patch: what if td2 spawns a new thread and thr_exits?
> > > In this case testing td2 is still invalid.  Maybe I'm just getting
> > > paranoid here, I don't have time to properly test this right now. In
> > > worst case should be fixable well enough with FIRST_THREAD_IN_PROC.
> > > 
> > > How about the following comment around _PHOLD:
> > > We are going to make the main thread runnable. It can quickly exit,
> > > causing the process to be reaped and possibly reused, thus invalidating
> > > our p2 pointer. Protect against this by holding the process, which
> > > postpones the exit.
> > > 
> > > And if the suspicion gets confimed the following would be added:
> > > diff --git a/sys/kern/kern_fork.c b/sys/kern/kern_fork.c
> > > index d0c3837..2a076ed 100644
> > > --- a/sys/kern/kern_fork.c
> > > +++ b/sys/kern/kern_fork.c
> > > @@ -773,6 +773,12 @@ do_fork(struct thread *td, struct fork_req *fr, struct proc *p2, struct thread *
> > >  
> > >         PROC_LOCK(p2);
> > >         /*
> > > +        * By the time we got here the thread could have created a new thread
> > > +        * and exited. Reload the main thread to ensure we got the right
> > > +        * pointer.
> > > +        */
> > > +       td2 = FIRST_THREAD_IN_PROC(p2);
> > > +       /*
> > >          * Wait until debugger is attached to child.
> > >          */
> > >         while ((td2->td_dbgflags & TDB_STOPATFORK) != 0)
> > > 
> > > 
> > 
> > To end the speculation and hackery I decided to reorganize the func a
> > little bit instead. Namely it can be trivially modified to not drop the
> > lock proc lock, which gets rid of all aforementioned races.
> 
> Here is a trivial change retaining doing knote_fork before waiting for
> the debugger. Here the problem is worked around by making td2 runnable
> after knote_fork is performed and without relocking p2 in-between.
> 
> diff --git a/sys/kern/kern_fork.c b/sys/kern/kern_fork.c
> index d0c3837..366262f 100644
> --- a/sys/kern/kern_fork.c
> +++ b/sys/kern/kern_fork.c
> @@ -715,11 +715,6 @@ do_fork(struct thread *td, struct fork_req *fr, struct proc *p2, struct thread *
>  	if ((flags & RFMEM) == 0 && dtrace_fasttrap_fork)
>  		dtrace_fasttrap_fork(p1, p2);
>  #endif
> -	/*
> -	 * Hold the process so that it cannot exit after we make it runnable,
> -	 * but before we wait for the debugger.
> -	 */
> -	_PHOLD(p2);
>  	if ((p1->p_flag & (P_TRACED | P_FOLLOWFORK)) == (P_TRACED |
>  	    P_FOLLOWFORK)) {
>  		/*
> @@ -756,6 +751,7 @@ do_fork(struct thread *td, struct fork_req *fr, struct proc *p2, struct thread *
>  		fdrop(fp_procdesc, td);
>  	}
>  
> +	PROC_LOCK(p2);
It is hard to read the patch over patch.  Is there proc lock for p1 owned ?
Note that the order for the proc locks is child->parent.  It is not catched
by witness.

>  	if ((flags & RFSTOPPED) == 0) {
>  		/*
>  		 * If RFSTOPPED not requested, make child runnable and
> @@ -771,13 +767,11 @@ do_fork(struct thread *td, struct fork_req *fr, struct proc *p2, struct thread *
>  		*fr->fr_procp = p2;
>  	}
>  
> -	PROC_LOCK(p2);
>  	/*
>  	 * Wait until debugger is attached to child.
>  	 */
>  	while ((td2->td_dbgflags & TDB_STOPATFORK) != 0)
>  		cv_wait(&p2->p_dbgwait, &p2->p_mtx);
> -	_PRELE(p2);
>  	racct_proc_fork_done(p2);
>  	PROC_UNLOCK(p2);
>  }
> 
> > 
> > I got 2 variants. For brevity both patches are applied on top of the
> > current patchset. When combined, the patch would be combined with second
> > patch in the patchset.
Can you commit two current patches to ease the conversation ?

> > 
> > Both currently and with the first patch below knote_fork can get a
> > now-freed or even recycled pid. I find that odd. This variant only saves
> > the pid and calls knote_fork. This is likely fine enough.
> > 
> > Just in case, the second variant adds a primitive -
> > knlist_empty_lockless to perform a racy check to see if there are any
> > knotes. If so, the process is held and released after knote_fork. Note
> > that this is no more racy than current code with respect to spotting
> > knotes. What does improve is the fact that the process is guaranteed to
> > be around for during knote_fork, although I don't know how important
> > this is.
> > 
> > With both variants we also save one lock/unlock round, and a second
> > lock/unlock with of knotes for the second variant in the common case.
> > 
> > Variant 1:
> > diff --git a/sys/kern/kern_fork.c b/sys/kern/kern_fork.c
> > index d0c3837..fae4eaf 100644
> > --- a/sys/kern/kern_fork.c
> > +++ b/sys/kern/kern_fork.c
> > @@ -378,7 +378,7 @@ do_fork(struct thread *td, struct fork_req *fr, struct proc *p2, struct thread *
> >      struct vmspace *vm2, struct file *fp_procdesc)
> >  {
> >  	struct proc *p1, *pptr;
> > -	int trypid;
> > +	int p2_pid, trypid;
> >  	struct filedesc *fd;
> >  	struct filedesc_to_leader *fdtol;
> >  	struct sigacts *newsigacts;
> > @@ -715,11 +715,6 @@ do_fork(struct thread *td, struct fork_req *fr, struct proc *p2, struct thread *
> >  	if ((flags & RFMEM) == 0 && dtrace_fasttrap_fork)
> >  		dtrace_fasttrap_fork(p1, p2);
> >  #endif
> > -	/*
> > -	 * Hold the process so that it cannot exit after we make it runnable,
> > -	 * but before we wait for the debugger.
> > -	 */
> > -	_PHOLD(p2);
> >  	if ((p1->p_flag & (P_TRACED | P_FOLLOWFORK)) == (P_TRACED |
> >  	    P_FOLLOWFORK)) {
> >  		/*
> > @@ -737,7 +732,6 @@ do_fork(struct thread *td, struct fork_req *fr, struct proc *p2, struct thread *
> >  		td->td_pflags |= TDP_RFPPWAIT;
> >  		td->td_rfppwait_p = p2;
> >  	}
> > -	PROC_UNLOCK(p2);
> >  
> >  	/*
> >  	 * Now can be swapped.
> > @@ -745,16 +739,10 @@ do_fork(struct thread *td, struct fork_req *fr, struct proc *p2, struct thread *
> >  	_PRELE(p1);
> >  	PROC_UNLOCK(p1);
> >  
> > -	/*
> > -	 * Tell any interested parties about the new process.
> > -	 */
> > -	knote_fork(&p1->p_klist, p2->p_pid);
> >  	SDT_PROBE3(proc, , , create, p2, p1, flags);
> >  
> > -	if (flags & RFPROCDESC) {
> > +	if (flags & RFPROCDESC)
> >  		procdesc_finit(p2->p_procdesc, fp_procdesc);
> > -		fdrop(fp_procdesc, td);
> > -	}
> >  
> >  	if ((flags & RFSTOPPED) == 0) {
> >  		/*
> > @@ -771,15 +759,26 @@ do_fork(struct thread *td, struct fork_req *fr, struct proc *p2, struct thread *
> >  		*fr->fr_procp = p2;
> >  	}
> >  
> > -	PROC_LOCK(p2);
> >  	/*
> >  	 * Wait until debugger is attached to child.
> >  	 */
> >  	while ((td2->td_dbgflags & TDB_STOPATFORK) != 0)
> >  		cv_wait(&p2->p_dbgwait, &p2->p_mtx);
> > -	_PRELE(p2);
> >  	racct_proc_fork_done(p2);
> > +	/*
> > +	 * The process can exit and be waited on after we drop the lock. Save
> > +	 * the pid so that it can be used for knotes.
> > +	 */
> > +	p2_pid = p2->p_pid;
> >  	PROC_UNLOCK(p2);
> > +
> > +	/*
> > +	 * Tell any interested parties about the new process.
> > +	 */
> > +	knote_fork(&p1->p_klist, p2_pid);
> > +
> > +	if (flags & RFPROCDESC)
> > +		fdrop(fp_procdesc, td);
> >  }
> >  
> >  int
> > ========================
> > 
> > Variant 2:
> > diff --git a/sys/kern/kern_event.c b/sys/kern/kern_event.c
> > index d41ac96..3610d8a 100644
> > --- a/sys/kern/kern_event.c
> > +++ b/sys/kern/kern_event.c
> > @@ -2038,12 +2038,22 @@ knlist_remove_inevent(struct knlist *knl, struct knote *kn)
> >  	    (kn->kn_status & KN_HASKQLOCK) == KN_HASKQLOCK);
> >  }
> >  
> > +/*
> > + * For when the caller accepts that the check is inherently racy.
> > + */
> > +int
> > +knlist_empty_lockless(struct knlist *knl)
> > +{
> > +
> > +	return SLIST_EMPTY(&knl->kl_list);
> > +}
> > +
> >  int
> >  knlist_empty(struct knlist *knl)
> >  {
> >  
> >  	KNL_ASSERT_LOCKED(knl);
> > -	return SLIST_EMPTY(&knl->kl_list);
> > +	return knlist_empty_lockless(knl);
> >  }
> >  
> >  static struct mtx	knlist_lock;
> > diff --git a/sys/kern/kern_fork.c b/sys/kern/kern_fork.c
> > index d0c3837..70490ef 100644
> > --- a/sys/kern/kern_fork.c
> > +++ b/sys/kern/kern_fork.c
> > @@ -378,7 +378,7 @@ do_fork(struct thread *td, struct fork_req *fr, struct proc *p2, struct thread *
> >      struct vmspace *vm2, struct file *fp_procdesc)
> >  {
> >  	struct proc *p1, *pptr;
> > -	int trypid;
> > +	int p2_held, trypid;
> >  	struct filedesc *fd;
> >  	struct filedesc_to_leader *fdtol;
> >  	struct sigacts *newsigacts;
> > @@ -387,6 +387,7 @@ do_fork(struct thread *td, struct fork_req *fr, struct proc *p2, struct thread *
> >  	sx_assert(&proctree_lock, SX_SLOCKED);
> >  	sx_assert(&allproc_lock, SX_XLOCKED);
> >  
> > +	p2_held = 0;
> >  	p1 = td->td_proc;
> >  	flags = fr->fr_flags;
> >  
> > @@ -715,11 +716,6 @@ do_fork(struct thread *td, struct fork_req *fr, struct proc *p2, struct thread *
> >  	if ((flags & RFMEM) == 0 && dtrace_fasttrap_fork)
> >  		dtrace_fasttrap_fork(p1, p2);
> >  #endif
> > -	/*
> > -	 * Hold the process so that it cannot exit after we make it runnable,
> > -	 * but before we wait for the debugger.
> > -	 */
> > -	_PHOLD(p2);
> >  	if ((p1->p_flag & (P_TRACED | P_FOLLOWFORK)) == (P_TRACED |
> >  	    P_FOLLOWFORK)) {
> >  		/*
> > @@ -737,7 +733,6 @@ do_fork(struct thread *td, struct fork_req *fr, struct proc *p2, struct thread *
> >  		td->td_pflags |= TDP_RFPPWAIT;
> >  		td->td_rfppwait_p = p2;
> >  	}
> > -	PROC_UNLOCK(p2);
> >  
> >  	/*
> >  	 * Now can be swapped.
> > @@ -745,16 +740,10 @@ do_fork(struct thread *td, struct fork_req *fr, struct proc *p2, struct thread *
> >  	_PRELE(p1);
> >  	PROC_UNLOCK(p1);
> >  
> > -	/*
> > -	 * Tell any interested parties about the new process.
> > -	 */
> > -	knote_fork(&p1->p_klist, p2->p_pid);
> >  	SDT_PROBE3(proc, , , create, p2, p1, flags);
> >  
> > -	if (flags & RFPROCDESC) {
> > +	if (flags & RFPROCDESC)
> >  		procdesc_finit(p2->p_procdesc, fp_procdesc);
> > -		fdrop(fp_procdesc, td);
> > -	}
> >  
> >  	if ((flags & RFSTOPPED) == 0) {
> >  		/*
> > @@ -771,15 +760,32 @@ do_fork(struct thread *td, struct fork_req *fr, struct proc *p2, struct thread *
> >  		*fr->fr_procp = p2;
> >  	}
> >  
> > -	PROC_LOCK(p2);
> >  	/*
> >  	 * Wait until debugger is attached to child.
> >  	 */
> >  	while ((td2->td_dbgflags & TDB_STOPATFORK) != 0)
> >  		cv_wait(&p2->p_dbgwait, &p2->p_mtx);
> > -	_PRELE(p2);
> >  	racct_proc_fork_done(p2);
> > +	if (!knlist_empty_lockless(&p1->p_klist)) {
> > +		/*
> > +		 * Hold the process so that it does not exit until we call
> > +		 * knote_fork.
> > +		 */
> > +		_PHOLD(p2);
> > +		p2_held = 1;
> > +	}
> >  	PROC_UNLOCK(p2);
> > +
> > +	if (p2_held) {
> > +		/*
> > +		 * Tell any interested parties about the new process.
> > +		 */
> > +		knote_fork(&p1->p_klist, p2->p_pid);
> > +		PRELE(p2);
> > +	}
> > +
> > +	if (flags & RFPROCDESC)
> > +		fdrop(fp_procdesc, td);
> >  }
> >  
> >  int
> > diff --git a/sys/sys/event.h b/sys/sys/event.h
> > index 0f13231..771b3bb 100644
> > --- a/sys/sys/event.h
> > +++ b/sys/sys/event.h
> > @@ -254,6 +254,7 @@ extern void	knote_fork(struct knlist *list, int pid);
> >  extern void	knlist_add(struct knlist *knl, struct knote *kn, int islocked);
> >  extern void	knlist_remove(struct knlist *knl, struct knote *kn, int islocked);
> >  extern void	knlist_remove_inevent(struct knlist *knl, struct knote *kn);
> > +extern int	knlist_empty_lockless(struct knlist *knl);
> >  extern int	knlist_empty(struct knlist *knl);
> >  extern void	knlist_init(struct knlist *knl, void *lock,
> >      void (*kl_lock)(void *), void (*kl_unlock)(void *),
> > 
> > -- 
> > Mateusz Guzik <mjguzik gmail.com>
> 
> -- 
> Mateusz Guzik <mjguzik gmail.com>