git: c98367641991 - main - vm_fault: Defer marking COW pages valid

From: Mark Johnston <markj_at_FreeBSD.org>
Date: Sun, 13 Apr 2025 16:12:35 UTC
The branch main has been updated by markj:

URL: https://cgit.FreeBSD.org/src/commit/?id=c98367641991019bac0e8cd55b70682171820534

commit c98367641991019bac0e8cd55b70682171820534
Author:     Mark Johnston <markj@FreeBSD.org>
AuthorDate: 2025-04-13 16:09:31 +0000
Commit:     Mark Johnston <markj@FreeBSD.org>
CommitDate: 2025-04-13 16:09:31 +0000

    vm_fault: Defer marking COW pages valid
    
    Suppose an object O has two shadow objects S1, S2 mapped into processes
    P1, P2.  Suppose a page resident in O is mapped read-only into P1.  Now
    suppose that P1 writes to the page, triggering a COW fault: it allocates
    a new page in S1 and copies the page, then marks it valid.  If the page
    in O was busy when initially looked up, P1 would have to release the map
    lock and sleep first.  Then, after handling COW, P1 must re-check the
    map lookup because locks were dropped.  Suppose the map indeed changed,
    so P1 has to retry the fault.
    
    At this point, the mapped page in O is shadowed by a valid page in S1.
    If P2 exits, S2 will be deallocated, resulting in a collapse of O into
    S1.  In this case, because the mapped page is shadowed, P2 will free it,
    but that is illegal; this triggers a "freeing mapped page" assertion in
    invariants kernels.
    
    Fix the problem by deferring the vm_page_valid() call which marks the
    COW copy valid: only mark it once we know that the fault handler will
    succeed.  It's okay to leave an invalid page in the top-level object; it
    will be freed when the fault is retried, and vm_object_collapse_scan()
    will similarly free invalid pages in the shadow object.
    
    Reviewed by:    kib
    MFC after:      1 month
    Sponsored by:   Innovate UK
    Differential Revision:  https://reviews.freebsd.org/D49758
---
 sys/vm/vm_fault.c | 19 ++++++++++++++++---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/sys/vm/vm_fault.c b/sys/vm/vm_fault.c
index 81631b672040..0bd3a8207c4a 100644
--- a/sys/vm/vm_fault.c
+++ b/sys/vm/vm_fault.c
@@ -1064,14 +1064,14 @@ vm_fault_cow(struct faultstate *fs)
 		 * Oh, well, lets copy it.
 		 */
 		pmap_copy_page(fs->m, fs->first_m);
-		vm_page_valid(fs->first_m);
 		if (fs->wired && (fs->fault_flags & VM_FAULT_WIRE) == 0) {
 			vm_page_wire(fs->first_m);
 			vm_page_unwire(fs->m, PQ_INACTIVE);
 		}
 		/*
-		 * Save the cow page to be released after
-		 * pmap_enter is complete.
+		 * Save the COW page to be released after pmap_enter is
+		 * complete.  The new copy will be marked valid when we're ready
+		 * to map it.
 		 */
 		fs->m_cow = fs->m;
 		fs->m = NULL;
@@ -1759,6 +1759,19 @@ found:
 	if (hardfault)
 		fs.entry->next_read = vaddr + ptoa(ahead) + PAGE_SIZE;
 
+	/*
+	 * If the page to be mapped was copied from a backing object, we defer
+	 * marking it valid until here, where the fault handler is guaranteed to
+	 * succeed.  Otherwise we can end up with a shadowed, mapped page in the
+	 * backing object, which violates an invariant of vm_object_collapse()
+	 * that shadowed pages are not mapped.
+	 */
+	if (fs.m_cow != NULL) {
+		KASSERT(vm_page_none_valid(fs.m),
+		    ("vm_fault: page %p is already valid", fs.m_cow));
+		vm_page_valid(fs.m);
+	}
+
 	/*
 	 * Page must be completely valid or it is not fit to
 	 * map into user space.  vm_pager_get_pages() ensures this.