The arm64 fork-then-swap-out-then-swap-in failures: a program source for exploring them

Mark Millard markmi at dsl-only.net
Sun Apr 9 18:25:03 UTC 2017


On 2017-Apr-9, at 10:24 AM, Mark Millard <markmi at dsl-only.net> wrote:

> On 2017-Apr-9, at 5:27 AM, Konstantin Belousov <kostikbel at gmail.com> wrote:
> 
>> On Sat, Apr 08, 2017 at 06:02:00PM -0700, Mark Millard wrote:
>>> [I've identified the code path involved is the arm64 small allocations
>>> turning into zeros for later fork-then-swapout-then-back-in,
>>> specifically the ongoing RES(ident memory) size decrease that
>>> "top -PCwaopid" shows before the fork/swap sequence. Hopefully
>>> I've also exposed enough related information for someone that
>>> knows what they are doing to get started with a specific
>>> investigation, looking for a fix. I'd like for a pine64+
>>> 2GB to have buildworld complete despite the forking and
>>> swapping involved (yep: for a time zero RES(ident memory) for
>>> some processes involved in the build).]
>> 
>> I was not able to follow the walls of text, but do not think that
>> I pmap_ts_reference() is the real culprit there.
>> 
>> Is my impression right that the issue occurs on fork, and looks as
>> a memory corruption, where some page suddently becomes zero-filled ?
>> And swapping seems to be involved ?  It is somewhat interesting to see
>> if the problem is reproducable on non-arm64 machines, e.g. armv7 or amd64.
> 
> Yes, yes, non-arm64 that I've tried works.
> 
> But I think that the following extra detail my be of use: what top
> shows for RES over time is also odd on arm64 (only) and the amount
> of pages that are zeroed is proportional to the decrease in RES.
> 
> In the test sequence:
> 
> A) Allocate lots of 14 KiByte allocations and initialize the content of each
> to non-zero. The example ends up with RES of about 265M.

I did forget to list one important property: why I picked 14 KiBytes.

A) Any allocation sizes <= 14 KiBytes that I've tried
   gets the zero's problem in my arm64 contexts (bpim3 and rip3).

B) Any allocation size >= 14 KiBYtes + 1 Byte that I've
   tried works in those contexts.

For the arm64 contexts that I use this happens to match with
the jemalloc SMALL_MAXCLASS size boundary. When I looked it
appeared that 14 Ki was the smallest SMALL_MAXCLASS value
in jemalloc so it would always fit the category.

> B) sleep some amount of time, I've been using well over 30 seconds here.
> 
> C) fork
> 
> D) sleep again (parent and child), also forcing swapping during the sleep
>   (I used stress, manually run.)
> 
> E) Test the memory pattern in the parent and child process, passing over
>   all the bytes, failed and good.
> 
> Both the parent and the child in (E) see the first pages allocated as zero,
> with the number of pages being zero increasing as the sleep time in (B)
> increases (as long as the sleep is over 30 sec or so). The parent and child
> match for which pages are zero vs. not.
> 
> It fails with (B) being a no-op as well. But the proportionality with
> the time for the sleep is interesting.
> 
> During (B) "top -PCwaopid" shows RES decreasing, starting after 30 sec
> or so. The fork in (C) produces a child that does not have the same RES
> as the parent but instead a tiny RES (80K as I remember). During (E)
> the child's RES increases to full size.
> 
> My powerpc64, armv7, and amd64 tests of such do not fail, nor does RES
> decrease during (B). The child process gets the same RES as the parent
> as well, unlike for arm64.
> 
> In the failing context (arm64) RES in the parent decreases during (D)
> before the swap-out as well.
> 
>> If answers to my two questions are yes, there is probably some bug with
>> arm64 pmap handling of the dirty bit emulation.  ARMv8.0 does not provide
>> hardware dirty bit, and pmap interprets an accessed writeable page as
>> unconditionally dirty.  More, accessed bit is also not maintained by
>> hardware, instead if should be set by pmap.  And arm64 pmap sets the
>> AF bit unconditionally when creating valid pte.
> 
> fork-then-swap-out/in is required to see the problem. Neither fork
> by itself nor swapping (zero RES as shown in top) by itself have
> shown the problem so far.
> 
>> Hmm, could you try the following patch, I did not even compiled it.
> 
> I'll try it later today.
> 
>> diff --git a/sys/arm64/arm64/pmap.c b/sys/arm64/arm64/pmap.c
>> index 3d5756ba891..55aa402eb1c 100644
>> --- a/sys/arm64/arm64/pmap.c
>> +++ b/sys/arm64/arm64/pmap.c
>> @@ -2481,6 +2481,11 @@ pmap_protect(pmap_t pmap, vm_offset_t sva, vm_offset_t eva, vm_prot_t prot)
>> 		    sva += L3_SIZE) {
>> 			l3 = pmap_load(l3p);
>> 			if (pmap_l3_valid(l3)) {
>> +				if ((l3 & ATTR_SW_MANAGED) &&
>> +				    pmap_page_dirty(l3)) {
>> +					vm_page_dirty(PHYS_TO_VM_PAGE(l3 &
>> +					    ~ATTR_MASK));
>> +				}
>> 				pmap_set(l3p, ATTR_AP(ATTR_AP_RO));
>> 				PTE_SYNC(l3p);
>> 				/* XXX: Use pmap_invalidate_range */

===
Mark Millard
markmi at dsl-only.net



More information about the freebsd-arm mailing list