mmap MAP_NOSYNC regression in 10.x
Konstantin Belousov
kostikbel at gmail.com
Fri Sep 5 08:06:49 UTC 2014
On Fri, Sep 05, 2014 at 02:02:51AM -0500, Alan Cox wrote:
> On Thu, Sep 4, 2014 at 7:29 PM, Pieter de Goeje <pieter at degoeje.nl> wrote:
>
> > After upgrading my month old 10-stable installation today (to r271093) ,
> > I've noticed a that the kernel no longer honors the MAP_NOSYNC flag.
> > Attached is a demonstration program that highlights the issue (also
> > available here: http://pastebin.com/y0kvdn0r ).
> >
> > The program creates and mmap()s a 200MiB file and repeatedly writes zeros
> > to it. The expected behavior is that under normal circumstances (no memory
> > pressure), the dirtied pages are not flushed to disk. Observed is however
> > that every ~30 seconds the syncer kicks in and basically halts the program
> > while it does its job. The program prints a line everytime the throughput
> > drops below 500MBps, well below memory bandwidth.
> >
> > mmap() is called like this:
> >
> > void *p = mmap(NULL, len, PROT_READ | PROT_WRITE,
> > MAP_SHARED | MAP_NOSYNC | MAP_ALIGNED_SUPER, fd, 0);
> >
> > Sample output:
> >
> > write...
> > zeroing: 209.6 MB
> > ...write: 5.839s
> > mmap...
> > ...mmap: 0.000s
> > 20.1s: memset #259: 34.7MBps - stalled
> > 55.7s: memset #810: 34.7MBps - stalled
> > 91.3s: memset #1359: 34.6MBps - stalled
> > 100.0s: memset #1522: 3938.5MBps
> > overall bandwidth: 3190.6MBps
> > munmap...
> > ...munmap: 5.796s
> > done
> >
> > (this is a rather old system)
> >
> > If necessary I'm willing to find out the exact commit that caused the
> > problem.
> >
> >
>
> That's not necessary. This is a bug in the page fault handler's new fast
> path.
The following patch fixed the issue for me.
diff --git a/sys/vm/vm_fault.c b/sys/vm/vm_fault.c
index 30b0456..803bf59 100644
--- a/sys/vm/vm_fault.c
+++ b/sys/vm/vm_fault.c
@@ -174,6 +174,49 @@ unlock_and_deallocate(struct faultstate *fs)
}
}
+static void
+vm_fault_dirty(vm_map_entry_t entry, vm_page_t m, vm_prot_t prot,
+ vm_prot_t fault_type, int fault_flags, boolean_t set_wd)
+{
+
+ if (((prot & VM_PROT_WRITE) != 0 ||
+ (fault_flags & VM_FAULT_DIRTY) != 0) &&
+ (m->oflags & VPO_UNMANAGED) == 0) {
+ if (set_wd)
+ vm_object_set_writeable_dirty(m->object);
+
+ /*
+ * If this is a NOSYNC mmap we do not want to set VPO_NOSYNC
+ * if the page is already dirty to prevent data written with
+ * the expectation of being synced from not being synced.
+ * Likewise if this entry does not request NOSYNC then make
+ * sure the page isn't marked NOSYNC. Applications sharing
+ * data should use the same flags to avoid ping ponging.
+ */
+ if (entry->eflags & MAP_ENTRY_NOSYNC) {
+ if (m->dirty == 0)
+ m->oflags |= VPO_NOSYNC;
+ } else {
+ m->oflags &= ~VPO_NOSYNC;
+ }
+
+ /*
+ * If the fault is a write, we know that this page is being
+ * written NOW so dirty it explicitly to save on
+ * pmap_is_modified() calls later.
+ *
+ * Also tell the backing pager, if any, that it should remove
+ * any swap backing since the page is now dirty.
+ */
+ if (((fault_type & VM_PROT_WRITE) != 0 &&
+ (fault_flags & VM_FAULT_CHANGE_WIRING) == 0) ||
+ (fault_flags & VM_FAULT_DIRTY) != 0) {
+ vm_page_dirty(m);
+ vm_pager_page_unswapped(m);
+ }
+ }
+}
+
/*
* TRYPAGER - used by vm_fault to calculate whether the pager for the
* current object *might* contain the page.
@@ -321,11 +364,8 @@ RetryFault:;
vm_page_hold(m);
vm_page_unlock(m);
}
- if ((fault_type & VM_PROT_WRITE) != 0 &&
- (m->oflags & VPO_UNMANAGED) == 0) {
- vm_page_dirty(m);
- vm_pager_page_unswapped(m);
- }
+ vm_fault_dirty(fs.entry, m, prot, fault_type, fault_flags,
+ FALSE);
VM_OBJECT_RUNLOCK(fs.first_object);
if (!wired)
vm_fault_prefault(&fs, vaddr, 0, 0);
@@ -898,42 +938,7 @@ vnode_locked:
if (hardfault)
fs.entry->next_read = fs.pindex + faultcount - reqpage;
- if (((prot & VM_PROT_WRITE) != 0 ||
- (fault_flags & VM_FAULT_DIRTY) != 0) &&
- (fs.m->oflags & VPO_UNMANAGED) == 0) {
- vm_object_set_writeable_dirty(fs.object);
-
- /*
- * If this is a NOSYNC mmap we do not want to set VPO_NOSYNC
- * if the page is already dirty to prevent data written with
- * the expectation of being synced from not being synced.
- * Likewise if this entry does not request NOSYNC then make
- * sure the page isn't marked NOSYNC. Applications sharing
- * data should use the same flags to avoid ping ponging.
- */
- if (fs.entry->eflags & MAP_ENTRY_NOSYNC) {
- if (fs.m->dirty == 0)
- fs.m->oflags |= VPO_NOSYNC;
- } else {
- fs.m->oflags &= ~VPO_NOSYNC;
- }
-
- /*
- * If the fault is a write, we know that this page is being
- * written NOW so dirty it explicitly to save on
- * pmap_is_modified() calls later.
- *
- * Also tell the backing pager, if any, that it should remove
- * any swap backing since the page is now dirty.
- */
- if (((fault_type & VM_PROT_WRITE) != 0 &&
- (fault_flags & VM_FAULT_CHANGE_WIRING) == 0) ||
- (fault_flags & VM_FAULT_DIRTY) != 0) {
- vm_page_dirty(fs.m);
- vm_pager_page_unswapped(fs.m);
- }
- }
-
+ vm_fault_dirty(fs.entry, fs.m, prot, fault_type, fault_flags, TRUE);
vm_page_assert_xbusied(fs.m);
/*
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <http://lists.freebsd.org/pipermail/freebsd-hackers/attachments/20140905/b199f060/attachment.sig>
More information about the freebsd-hackers
mailing list