vm_page_array and VM_PHYSSEG_SPARSE

Alan Cox alc at rice.edu
Thu Oct 23 22:14:36 UTC 2014


On 10/08/2014 10:38, Svatopluk Kraus wrote:
> On Mon, Sep 29, 2014 at 3:00 AM, Alan Cox <alc at rice.edu> wrote:
>
>>   On 09/27/2014 03:51, Svatopluk Kraus wrote:
>>
>>
>> On Fri, Sep 26, 2014 at 8:08 PM, Alan Cox <alan.l.cox at gmail.com> wrote:
>>
>>>
>>>  On Wed, Sep 24, 2014 at 7:27 AM, Svatopluk Kraus <onwahe at gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I and Michal are finishing new ARM pmap-v6 code. There is one problem
>>>> we've
>>>> dealt with somehow, but now we would like to do it better. It's about
>>>> physical pages which are allocated before vm subsystem is initialized.
>>>> While later on these pages could be found in vm_page_array when
>>>> VM_PHYSSEG_DENSE memory model is used, it's not true for
>>>> VM_PHYSSEG_SPARSE
>>>> memory model. And ARM world uses VM_PHYSSEG_SPARSE model.
>>>>
>>>> It really would be nice to utilize vm_page_array for such preallocated
>>>> physical pages even when VM_PHYSSEG_SPARSE memory model is used. Things
>>>> could be much easier then. In our case, it's about pages which are used
>>>> for
>>>> level 2 page tables. In VM_PHYSSEG_SPARSE model, we have two sets of such
>>>> pages. First ones are preallocated and second ones are allocated after vm
>>>> subsystem was inited. We must deal with each set differently. So code is
>>>> more complex and so is debugging.
>>>>
>>>> Thus we need some method how to say that some part of physical memory
>>>> should be included in vm_page_array, but the pages from that region
>>>> should
>>>> not be put to free list during initialization. We think that such
>>>> possibility could be utilized in general. There could be a need for some
>>>> physical space which:
>>>>
>>>> (1) is needed only during boot and later on it can be freed and put to vm
>>>> subsystem,
>>>>
>>>> (2) is needed for something else and vm_page_array code could be used
>>>> without some kind of its duplication.
>>>>
>>>> There is already some code which deals with blacklisted pages in
>>>> vm_page.c
>>>> file. So the easiest way how to deal with presented situation is to add
>>>> some callback to this part of code which will be able to either exclude
>>>> whole phys_avail[i], phys_avail[i+1] region or single pages. As the
>>>> biggest
>>>> phys_avail region is used for vm subsystem allocations, there should be
>>>> some more coding. (However, blacklisted pages are not dealt with on that
>>>> part of region.)
>>>>
>>>> We would like to know if there is any objection:
>>>>
>>>> (1) to deal with presented problem,
>>>> (2) to deal with the problem presented way.
>>>> Some help is very appreciated. Thanks
>>>>
>>>>
>>> As an experiment, try modifying vm_phys.c to use dump_avail instead of
>>> phys_avail when sizing vm_page_array.  On amd64, where the same problem
>>> exists, this allowed me to use VM_PHYSSEG_SPARSE.  Right now, this is
>>> probably my preferred solution.  The catch being that not all architectures
>>> implement dump_avail, but my recollection is that arm does.
>>>
>> Frankly, I would prefer this too, but there is one big open question:
>>
>> What is dump_avail for?
>>
>>
>>
>> dump_avail[] is solving a similar problem in the minidump code, hence, the
>> prefix "dump_" in its name.  In other words, the minidump code couldn't use
>> phys_avail[] either because it didn't describe the full range of physical
>> addresses that might be included in a minidump, so dump_avail[] was created.
>>
>> There is already precedent for what I'm suggesting.  dump_avail[] is
>> already (ab)used outside of the minidump code on x86 to solve this same
>> problem in x86/x86/nexus.c, and on arm in arm/arm/mem.c.
>>
>>
>>  Using it for vm_page_array initialization and segmentation means that
>> phys_avail must be a subset of it. And this must be stated and be visible
>> enough. Maybe it should be even checked in code. I like the idea of
>> thinking about dump_avail as something what desribes all memory in a
>> system, but it's not how dump_avail is defined in archs now.
>>
>>
>>
>> When you say "it's not how dump_avail is defined in archs now", I'm not
>> sure whether you're talking about the code or the comments.  In terms of
>> code, dump_avail[] is a superset of phys_avail[], and I'm not aware of any
>> code that would have to change.  In terms of comments, I did a grep looking
>> for comments defining what dump_avail[] is, because I couldn't remember
>> any.  I found one ... on arm.  So, I don't think it's a onerous task
>> changing the definition of dump_avail[].  :-)
>>
>> Already, as things stand today with dump_avail[] being used outside of the
>> minidump code, one could reasonably argue that it should be renamed to
>> something like phys_exists[].
>>
>>
>>
>> I will experiment with it on monday then. However, it's not only about how
>> memory segments are created in vm_phys.c, but it's about how vm_page_array
>> size is computed in vm_page.c too.
>>
>>
>>
>> Yes, and there is also a place in vm_reserv.c that needs to change.   I've
>> attached the patch that I developed and tested a long time ago.  It still
>> applies cleanly and runs ok on amd64.
>>
>>
>>
>
>
> Well, I've created and tested minimalistic patch which - I hope - is
> commitable. It runs ok on pandaboard (arm-v6) and solves presented problem.
> I would really appreciate if this will be commited. Thanks.


Sorry for the slow reply.  I've just been swamped with work lately.  I
finally had some time to look at this in the last day or so.

The first thing that I propose to do is commit the attached patch.  This
patch changes pmap_init() on amd64, armv6, and i386 so that it no longer
consults phys_avail[] to determine the end of memory.  Instead, it calls
a new function provided by vm_phys.c to obtain the same information from
vm_phys_segs[].

With this change, the new variable phys_managed in your patch wouldn't
need to be a global.  It could be a local variable in vm_page_startup()
that we pass as a parameter to vm_phys_init() and vm_reserv_init().

More generally, the long-term vision that I have is that we would stop
using phys_avail[] after vm_page_startup() had completed.  It would only
be used during initialization.  After that we would use vm_phys_segs[]
and functions provided by vm_phys.c.

 
>
> BTW, while I was inspecting all archs, I think that maybe it's time to do
> what was done for busdma not long ago. There are many similar codes across
> archs which deal with physical memory and could be generalized and put to
> kern/subr_physmem.c for utilization. All work with physical memory could be
> simplify to two arrays of regions.
>
> phys_present[] ... describes all present physical memory regions
> phys_exclude[] ... describes various exclusions from phys_present[]
>
> Each excluded region will be labeled by flags to say what kind of exclusion
> it is. The flags like NODUMP, NOALLOC, NOMANAGE, NOBOUNCE, NOMEMRW  could
> be combined. This idea is taken from sys/arm/arm/physmem.c.
>
> All other arrays like phys_managed[], phys_avail[], dump_avail[] will be
> created from these phys_present[] and phys_exclude[].
> This way bootstrap codes in archs could be simplified and unified. For
> example, dealing with either hw.physmem or page with PA 0x00000000 could be
> transparent.
>
> I'm prepared to volunteer if the thing is ripe. However, some tutor will be
> looked for.


I've never really looked at arm/arm/physmem.c before.  Let me do that
before I comment on this.


-------------- next part --------------
Index: amd64/amd64/pmap.c
===================================================================
--- amd64/amd64/pmap.c	(revision 273550)
+++ amd64/amd64/pmap.c	(working copy)
@@ -130,6 +130,7 @@ __FBSDID("$FreeBSD$");
 #include <vm/vm_extern.h>
 #include <vm/vm_pageout.h>
 #include <vm/vm_pager.h>
+#include <vm/vm_phys.h>
 #include <vm/vm_radix.h>
 #include <vm/vm_reserv.h>
 #include <vm/uma.h>
@@ -1060,8 +1061,7 @@ pmap_init(void)
 	/*
 	 * Calculate the size of the pv head table for superpages.
 	 */
-	for (i = 0; phys_avail[i + 1]; i += 2);
-	pv_npg = round_2mpage(phys_avail[(i - 2) + 1]) / NBPDR;
+	pv_npg = round_2mpage(vm_phys_get_end()) / NBPDR;
 
 	/*
 	 * Allocate memory for the pv head table for superpages.
Index: arm/arm/pmap-v6.c
===================================================================
--- arm/arm/pmap-v6.c	(revision 273550)
+++ arm/arm/pmap-v6.c	(working copy)
@@ -172,6 +172,7 @@ __FBSDID("$FreeBSD$");
 #include <vm/vm_map.h>
 #include <vm/vm_page.h>
 #include <vm/vm_pageout.h>
+#include <vm/vm_phys.h>
 #include <vm/vm_extern.h>
 #include <vm/vm_reserv.h>
 
@@ -1343,8 +1344,7 @@ pmap_init(void)
 	/*
 	 * Calculate the size of the pv head table for superpages.
 	 */
-	for (i = 0; phys_avail[i + 1]; i += 2);
-	pv_npg = round_1mpage(phys_avail[(i - 2) + 1]) / NBPDR;
+	pv_npg = round_1mpage(vm_phys_get_end()) / NBPDR;
 
 	/*
 	 * Allocate memory for the pv head table for superpages.
Index: i386/i386/pmap.c
===================================================================
--- i386/i386/pmap.c	(revision 273550)
+++ i386/i386/pmap.c	(working copy)
@@ -133,6 +133,7 @@ __FBSDID("$FreeBSD$");
 #include <vm/vm_extern.h>
 #include <vm/vm_pageout.h>
 #include <vm/vm_pager.h>
+#include <vm/vm_phys.h>
 #include <vm/vm_radix.h>
 #include <vm/vm_reserv.h>
 #include <vm/uma.h>
@@ -779,8 +780,7 @@ pmap_init(void)
 	/*
 	 * Calculate the size of the pv head table for superpages.
 	 */
-	for (i = 0; phys_avail[i + 1]; i += 2);
-	pv_npg = round_4mpage(phys_avail[(i - 2) + 1]) / NBPDR;
+	pv_npg = round_4mpage(vm_phys_get_end()) / NBPDR;
 
 	/*
 	 * Allocate memory for the pv head table for superpages.
Index: vm/vm_phys.c
===================================================================
--- vm/vm_phys.c	(revision 273550)
+++ vm/vm_phys.c	(working copy)
@@ -856,6 +856,17 @@ vm_phys_free_contig(vm_page_t m, u_long npages)
 }
 
 /*
+ * Return the physical address of the end of memory, that is, the address of
+ * the last useable byte of RAM plus one.
+ */
+vm_paddr_t
+vm_phys_get_end(void)
+{
+
+	return (vm_phys_segs[vm_phys_nsegs - 1].end);
+}
+
+/*
  * Set the pool for a contiguous, power of two-sized set of physical pages. 
  */
 void
Index: vm/vm_phys.h
===================================================================
--- vm/vm_phys.h	(revision 273550)
+++ vm/vm_phys.h	(working copy)
@@ -80,6 +80,7 @@ void vm_phys_fictitious_unreg_range(vm_paddr_t sta
 vm_page_t vm_phys_fictitious_to_vm_page(vm_paddr_t pa);
 void vm_phys_free_contig(vm_page_t m, u_long npages);
 void vm_phys_free_pages(vm_page_t m, int order);
+vm_paddr_t vm_phys_get_end(void);
 void vm_phys_init(void);
 vm_page_t vm_phys_paddr_to_vm_page(vm_paddr_t pa);
 void vm_phys_set_pool(int pool, vm_page_t m, int order);


More information about the freebsd-arch mailing list