vm_page_array and VM_PHYSSEG_SPARSE

Alan Cox alc at rice.edu
Mon Sep 29 01:25:05 UTC 2014


On 09/27/2014 03:51, Svatopluk Kraus wrote:
>
> On Fri, Sep 26, 2014 at 8:08 PM, Alan Cox <alan.l.cox at gmail.com
> <mailto:alan.l.cox at gmail.com>> wrote:
>
>
>
>     On Wed, Sep 24, 2014 at 7:27 AM, Svatopluk Kraus <onwahe at gmail.com
>     <mailto:onwahe at gmail.com>> wrote:
>
>         Hi,
>
>         I and Michal are finishing new ARM pmap-v6 code. There is one
>         problem we've
>         dealt with somehow, but now we would like to do it better.
>         It's about
>         physical pages which are allocated before vm subsystem is
>         initialized.
>         While later on these pages could be found in vm_page_array when
>         VM_PHYSSEG_DENSE memory model is used, it's not true for
>         VM_PHYSSEG_SPARSE
>         memory model. And ARM world uses VM_PHYSSEG_SPARSE model.
>
>         It really would be nice to utilize vm_page_array for such
>         preallocated
>         physical pages even when VM_PHYSSEG_SPARSE memory model is
>         used. Things
>         could be much easier then. In our case, it's about pages which
>         are used for
>         level 2 page tables. In VM_PHYSSEG_SPARSE model, we have two
>         sets of such
>         pages. First ones are preallocated and second ones are
>         allocated after vm
>         subsystem was inited. We must deal with each set differently.
>         So code is
>         more complex and so is debugging.
>
>         Thus we need some method how to say that some part of physical
>         memory
>         should be included in vm_page_array, but the pages from that
>         region should
>         not be put to free list during initialization. We think that such
>         possibility could be utilized in general. There could be a
>         need for some
>         physical space which:
>
>         (1) is needed only during boot and later on it can be freed
>         and put to vm
>         subsystem,
>
>         (2) is needed for something else and vm_page_array code could
>         be used
>         without some kind of its duplication.
>
>         There is already some code which deals with blacklisted pages
>         in vm_page.c
>         file. So the easiest way how to deal with presented situation
>         is to add
>         some callback to this part of code which will be able to
>         either exclude
>         whole phys_avail[i], phys_avail[i+1] region or single pages.
>         As the biggest
>         phys_avail region is used for vm subsystem allocations, there
>         should be
>         some more coding. (However, blacklisted pages are not dealt
>         with on that
>         part of region.)
>
>         We would like to know if there is any objection:
>
>         (1) to deal with presented problem,
>         (2) to deal with the problem presented way.
>         Some help is very appreciated. Thanks
>
>
>
>     As an experiment, try modifying vm_phys.c to use dump_avail
>     instead of phys_avail when sizing vm_page_array.  On amd64, where
>     the same problem exists, this allowed me to use
>     VM_PHYSSEG_SPARSE.  Right now, this is probably my preferred
>     solution.  The catch being that not all architectures implement
>     dump_avail, but my recollection is that arm does.
>
>  
> Frankly, I would prefer this too, but there is one big open question:
>  
> What is dump_avail for?


dump_avail[] is solving a similar problem in the minidump code, hence,
the prefix "dump_" in its name.  In other words, the minidump code
couldn't use phys_avail[] either because it didn't describe the full
range of physical addresses that might be included in a minidump, so
dump_avail[] was created.

There is already precedent for what I'm suggesting.  dump_avail[] is
already (ab)used outside of the minidump code on x86 to solve this same
problem in x86/x86/nexus.c, and on arm in arm/arm/mem.c.


> Using it for vm_page_array initialization and segmentation means that
> phys_avail must be a subset of it. And this must be stated and be
> visible enough. Maybe it should be even checked in code. I like the
> idea of thinking about dump_avail as something what desribes all
> memory in a system, but it's not how dump_avail is defined in archs now.


When you say "it's not how dump_avail is defined in archs now", I'm not
sure whether you're talking about the code or the comments.  In terms of
code, dump_avail[] is a superset of phys_avail[], and I'm not aware of
any code that would have to change.  In terms of comments, I did a grep
looking for comments defining what dump_avail[] is, because I couldn't
remember any.  I found one ... on arm.  So, I don't think it's a onerous
task changing the definition of dump_avail[].  :-)

Already, as things stand today with dump_avail[] being used outside of
the minidump code, one could reasonably argue that it should be renamed
to something like phys_exists[].


>  
> I will experiment with it on monday then. However, it's not only about
> how memory segments are created in vm_phys.c, but it's about how
> vm_page_array size is computed in vm_page.c too.


Yes, and there is also a place in vm_reserv.c that needs to change.  
I've attached the patch that I developed and tested a long time ago.  It
still applies cleanly and runs ok on amd64.


-------------- next part --------------
Index: vm/vm_phys.c
===================================================================
--- vm/vm_phys.c	(revision 220102)
+++ vm/vm_phys.c	(working copy)
@@ -289,38 +289,38 @@ vm_phys_init(void)
 	int ndomains, j;
 #endif
 
-	for (i = 0; phys_avail[i + 1] != 0; i += 2) {
+	for (i = 0; dump_avail[i + 1] != 0; i += 2) {
 #ifdef	VM_FREELIST_ISADMA
-		if (phys_avail[i] < 16777216) {
-			if (phys_avail[i + 1] > 16777216) {
-				vm_phys_create_seg(phys_avail[i], 16777216,
+		if (dump_avail[i] < 16777216) {
+			if (dump_avail[i + 1] > 16777216) {
+				vm_phys_create_seg(dump_avail[i], 16777216,
 				    VM_FREELIST_ISADMA);
-				vm_phys_create_seg(16777216, phys_avail[i + 1],
+				vm_phys_create_seg(16777216, dump_avail[i + 1],
 				    VM_FREELIST_DEFAULT);
 			} else {
-				vm_phys_create_seg(phys_avail[i],
-				    phys_avail[i + 1], VM_FREELIST_ISADMA);
+				vm_phys_create_seg(dump_avail[i],
+				    dump_avail[i + 1], VM_FREELIST_ISADMA);
 			}
 			if (VM_FREELIST_ISADMA >= vm_nfreelists)
 				vm_nfreelists = VM_FREELIST_ISADMA + 1;
 		} else
 #endif
 #ifdef	VM_FREELIST_HIGHMEM
-		if (phys_avail[i + 1] > VM_HIGHMEM_ADDRESS) {
-			if (phys_avail[i] < VM_HIGHMEM_ADDRESS) {
-				vm_phys_create_seg(phys_avail[i],
+		if (dump_avail[i + 1] > VM_HIGHMEM_ADDRESS) {
+			if (dump_avail[i] < VM_HIGHMEM_ADDRESS) {
+				vm_phys_create_seg(dump_avail[i],
 				    VM_HIGHMEM_ADDRESS, VM_FREELIST_DEFAULT);
 				vm_phys_create_seg(VM_HIGHMEM_ADDRESS,
-				    phys_avail[i + 1], VM_FREELIST_HIGHMEM);
+				    dump_avail[i + 1], VM_FREELIST_HIGHMEM);
 			} else {
-				vm_phys_create_seg(phys_avail[i],
-				    phys_avail[i + 1], VM_FREELIST_HIGHMEM);
+				vm_phys_create_seg(dump_avail[i],
+				    dump_avail[i + 1], VM_FREELIST_HIGHMEM);
 			}
 			if (VM_FREELIST_HIGHMEM >= vm_nfreelists)
 				vm_nfreelists = VM_FREELIST_HIGHMEM + 1;
 		} else
 #endif
-		vm_phys_create_seg(phys_avail[i], phys_avail[i + 1],
+		vm_phys_create_seg(dump_avail[i], dump_avail[i + 1],
 		    VM_FREELIST_DEFAULT);
 	}
 	for (flind = 0; flind < vm_nfreelists; flind++) {
Index: vm/vm_reserv.c
===================================================================
--- vm/vm_reserv.c	(revision 220102)
+++ vm/vm_reserv.c	(working copy)
@@ -487,9 +487,9 @@ vm_reserv_init(void)
 	 * Initialize the reservation array.  Specifically, initialize the
 	 * "pages" field for every element that has an underlying superpage.
 	 */
-	for (i = 0; phys_avail[i + 1] != 0; i += 2) {
-		paddr = roundup2(phys_avail[i], VM_LEVEL_0_SIZE);
-		while (paddr + VM_LEVEL_0_SIZE <= phys_avail[i + 1]) {
+	for (i = 0; dump_avail[i + 1] != 0; i += 2) {
+		paddr = roundup2(dump_avail[i], VM_LEVEL_0_SIZE);
+		while (paddr + VM_LEVEL_0_SIZE <= dump_avail[i + 1]) {
 			vm_reserv_array[paddr >> VM_LEVEL_0_SHIFT].pages =
 			    PHYS_TO_VM_PAGE(paddr);
 			paddr += VM_LEVEL_0_SIZE;
Index: vm/vm_page.c
===================================================================
--- vm/vm_page.c	(revision 220102)
+++ vm/vm_page.c	(working copy)
@@ -389,8 +389,8 @@ vm_page_startup(vm_offset_t vaddr)
 	first_page = low_water / PAGE_SIZE;
 #ifdef VM_PHYSSEG_SPARSE
 	page_range = 0;
-	for (i = 0; phys_avail[i + 1] != 0; i += 2)
-		page_range += atop(phys_avail[i + 1] - phys_avail[i]);
+	for (i = 0; dump_avail[i + 1] != 0; i += 2)
+		page_range += atop(dump_avail[i + 1] - dump_avail[i]);
 #elif defined(VM_PHYSSEG_DENSE)
 	page_range = high_water / PAGE_SIZE - first_page;
 #else
Index: amd64/include/vmparam.h
===================================================================
--- amd64/include/vmparam.h	(revision 220102)
+++ amd64/include/vmparam.h	(working copy)
@@ -79,7 +79,7 @@
 /*
  * The physical address space is densely populated.
  */
-#define	VM_PHYSSEG_DENSE
+#define	VM_PHYSSEG_SPARSE
 
 /*
  * The number of PHYSSEG entries must be one greater than the number


More information about the freebsd-arch mailing list