git: 8c75c15d43e4 - main - jail: Avoid a potential use-after-free when destroying jails

From: Mark Johnston <markj_at_FreeBSD.org>
Date: Mon, 06 Jan 2025 22:56:58 UTC
The branch main has been updated by markj:

URL: https://cgit.FreeBSD.org/src/commit/?id=8c75c15d43e4123bc51f24f5bf99319289c45a6c

commit 8c75c15d43e4123bc51f24f5bf99319289c45a6c
Author:     Mark Johnston <markj@FreeBSD.org>
AuthorDate: 2025-01-06 22:53:38 +0000
Commit:     Mark Johnston <markj@FreeBSD.org>
CommitDate: 2025-01-06 22:53:38 +0000

    jail: Avoid a potential use-after-free when destroying jails
    
    prison_deref() and prison_deref_kill() have to handle the case where
    destruction of a jail will release the final reference on the jail's
    parent, resulting in destruction of the parent jail.  They thus maintain
    a list of jails whose references have gone away; the loop at the end of
    prison_deref() then goes through the list and deallocates resources
    associated with each jail.  In particular, if a jail's VNET is not the
    same as that of its parent, this loop destroys the VNET.
    
    Suppose prison_deref() removes the last reference on a jail, releasing a
    reference to its parent and causing the jail to be placed in the
    "freeprison" list.  Suppose then that the parent jail is destroyed
    before the "freeprison" list is processed.  When destroying the
    now-orphaned child jail, prison_deref() derefences its parent to see
    whether the child jail's VNET needs to be freed, but if this race
    occurs, this is a use-after-free.
    
    Fix the problem by using PR_VNET to decide whether the jail's VNET is to
    be destroyed, rather than dereferencing the parent jail pointer.  Set it
    earlier so that a subsequent failure in kern_jail_set() cleans up the
    nascent VNET.
    
    Reviewed by:    zlei (previous version), jamie
    MFC after:      2 weeks
    Differential Revision:  https://reviews.freebsd.org/D47992
---
 sys/kern/kern_jail.c | 22 +++++++++++++++++-----
 1 file changed, 17 insertions(+), 5 deletions(-)

diff --git a/sys/kern/kern_jail.c b/sys/kern/kern_jail.c
index ad6483ed374d..6ffeab59112b 100644
--- a/sys/kern/kern_jail.c
+++ b/sys/kern/kern_jail.c
@@ -1687,9 +1687,18 @@ kern_jail_set(struct thread *td, struct uio *optuio, int flags)
 			    sizeof(pr->pr_osrelease));
 
 #ifdef VIMAGE
-		/* Allocate a new vnet if specified. */
-		pr->pr_vnet = (pr_flags & PR_VNET)
-		    ? vnet_alloc() : ppr->pr_vnet;
+		/*
+		 * Allocate a new vnet if specified.
+		 *
+		 * Set PR_VNET now if so, so that the vnet is disposed of
+		 * properly when the jail is destroyed.
+		 */
+		if (pr_flags & PR_VNET) {
+			pr->pr_flags |= PR_VNET;
+			pr->pr_vnet = vnet_alloc();
+		} else {
+			pr->pr_vnet = ppr->pr_vnet;
+		}
 #endif
 		/*
 		 * Allocate a dedicated cpuset for each jail.
@@ -3207,9 +3216,12 @@ prison_deref(struct prison *pr, int flags)
 					 * Removing a prison frees references
 					 * from its parent.
 					 */
+					ppr = pr->pr_parent;
+					pr->pr_parent = NULL;
 					mtx_unlock(&pr->pr_mtx);
+
+					pr = ppr;
 					flags &= ~PD_LOCKED;
-					pr = pr->pr_parent;
 					flags |= PD_DEREF | PD_DEUREF;
 					continue;
 				}
@@ -3236,7 +3248,7 @@ prison_deref(struct prison *pr, int flags)
 	 */
 	TAILQ_FOREACH_SAFE(rpr, &freeprison, pr_list, tpr) {
 #ifdef VIMAGE
-		if (rpr->pr_vnet != rpr->pr_parent->pr_vnet)
+		if (rpr->pr_flags & PR_VNET)
 			vnet_destroy(rpr->pr_vnet);
 #endif
 		if (rpr->pr_root != NULL)