Memory modified after free in "MAP ENTRY" zone (vm_map_entry_t->read_ahead)

Thu Feb 18 15:13:32 UTC 2016

On Mon, Feb 15, 2016 at 11:07:11AM +0200, Andriy Gapon wrote:
> On 15/02/2016 00:27, Alan Cox wrote:
> > 
> > On Sun, Feb 14, 2016 at 8:09 AM, Andriy Gapon <avg at freebsd.org
> > <mailto:avg at freebsd.org>> wrote:
> > 
> >     On 10/02/2016 23:28, Andriy Gapon wrote:
> >     >
> >     > Over a span of approximately 3 weeks I have got two slightly different panics of
> >     > the same kind.   The affected system is a several months old amd64 head.
> > 
> >     I added a small assertion and it got tripped:
> [snip]
> > 
> >     So, it seems that the code allows modification of read_ahead field of an entry
> >     without holding a map lock.  I'd guess that that should be mostly harmless as
> >     read_ahead value is not critical, but it is still not nice.
> > 
> > Not intentionally.  The nearby code to the read_ahead assignment expects the map
> > to be locked, so I wouldn't categorize this as mostly harmless.
> > 
> > In general, you shouldn't get to the read_ahead assignment without the map being
> > locked, and almost all of the code paths that unlock the map jump to RetryFault
> > shortly thereafter, so it's hard to imagine how the map lock wouldn't be
> > reacquired.  I speculate that the root cause of your panic is a case where
> > vm_pager_get_pages() fails, and in such a case we might loop around, descending
> > to the next backing object without reacquiring the map lock.  In other words,
> > your above assertion failure is happening on the next iteration after an initial
> > vm_pager_get_pages() failure, and we're about to do another vm_pager_get_pages()
> > call on a different object.
> > 
> > In summary, I have no trouble believing that the code that handles a failure by
> > vm_pager_get_pages() in vm_fault() is not quite right, and I think that's what's
> > biting you.
> 
> Alan,
> 
> thank you very much for the very insightful analysis.
> Indeed, I see that in this case the object chain is default -> swap -> swap.  I
> am not sure how such chain was created.  It seems that going default -> swap is
> not a problem as the map lock is not dropped in this case.  But going swap ->
> swap the way you described (pager error, e.g. the page is just not found) has
> exactly the problem that you suggested.
> 

So this is arguably a fallout from r188331.
The following is somewhat non-insistent attempt to fix the problem.

diff --git a/sys/vm/vm_fault.c b/sys/vm/vm_fault.c
index a7e3d37..cddf1eb 100644
--- a/sys/vm/vm_fault.c
+++ b/sys/vm/vm_fault.c
@@ -291,7 +291,8 @@ vm_fault_hold(vm_map_t map, vm_offset_t vaddr, vm_prot_t fault_type,
 	struct faultstate fs;
 	struct vnode *vp;
 	vm_page_t m;
-	int ahead, behind, cluster_offset, error, locked;
+	int ahead, behind, cluster_offset, error, locked, rv;
+	u_char behavior;
 
 	hardfault = 0;
 	growstack = TRUE;
@@ -550,9 +551,18 @@ readrest:
 		 * zero-filled pages.
 		 */
 		if (fs.object->type != OBJT_DEFAULT) {
-			int rv;
-			u_char behavior = vm_map_entry_behavior(fs.entry);
-
+			if (!fs.lookup_still_valid) {
+				locked = vm_map_trylock_read(fs.map);
+				if (locked)
+					fs.lookup_still_valid = TRUE;
+				if (!locked || fs.map->timestamp !=
+				    map_generation) {
+					release_page(&fs);
+					unlock_and_deallocate(&fs);
+					goto RetryFault;
+				}
+			}
+			behavior = vm_map_entry_behavior(fs.entry);
 			era = fs.entry->read_ahead;
 			if (behavior == MAP_ENTRY_BEHAV_RANDOM ||
 			    P_KILLED(curproc)) {
@@ -603,6 +613,7 @@ readrest:
 			}
 			ahead = ulmin(ahead, atop(fs.entry->end - vaddr) - 1);
 			if (era != nera)
+				/* XXX only read-lock on map */
 				fs.entry->read_ahead = nera;
 
 			/*