fork-then-swap-out [zero RES(ident memory)] questions tied to arm64 failures (zeroed memory pages)

Wed Mar 22 22:09:29 UTC 2017

The later questions are associated with:

Bugzilla 217239 and 217138 (which I now expect have a common cause)
https://lists.freebsd.org/pipermail/freebsd-arm/2017-March/015867.html
(and its thread)

These are tied to some process memory pages being trashed (to
be zero) in particular types of arm64 contexts. This is
reproducible in multiple arm64 contexts. The context is head
but I believe there are reports in the lists tied to 11 as
well.

[Unfortunately the above all very much shows a learn-as-I-go
property. Also the list has a sub-exchange on my testing other
devices to check for device failures that is not directly
relevant here.]

These are tied to problems with fork-then-swap-out-then-swap-in
contexts on arm64. (Even though I've occasionally typed amd64
accidentally in places in those materials.) Memory allocations
from before the fork are involved, ones not yet accessed by
the child side of the fork at the time of the fork.

fork sets up copy-on-write so that the child process temporarily
shares pages (those it does not write), or should.

But what if the parent process or both parent and child are
swapped-out just shortly after the fork (so, say, top -PCwaopid
shows zero for RES(ident memory)? What is the handling of, say,
the child swapping back in while the parent still is swapped
out?

I notice that the child can have a much smaller SWAP figure
than the parent so it would appear that the parent swap-out
has pages that the child does not.

So what if the child needs some of those pages? What should
happen? (Vs. what does happen on arm64 in specific types
of contexts? More below.)

I ask mostly to see if I can do more to give evidence of
what is going on and what to test for any proposed fix.
I'm not likely to find the underlying problem(s) for arm64
directly, unlike my investigation that lead to
fork-trampoline being fixed in head's -r313772
(2017-Feb-15).

[ https://lists.freebsd.org/pipermail/freebsd-arm/2017-February/015656.html
  and its thread, including when its title changed in:
  https://lists.freebsd.org/pipermail/freebsd-arm/2017-February/015678.html
.]

Part of that unlikely-to-solve status is because the
context seems to be bound to a lot of special conditions
and interesting behaviors simultaneously:

A) Both my original reproductions of problem reports on the
   lists and the only (simple) programs for reproducing the
   probablems involve fork-then-swap-out [zero RES(ident
   memory)]. Neither fork by itself nor swap-out/in by
   itself have been sufficient.

B) jemalloc's tcache being in use (__je_tcache_maxclass == 32*1024)
   is part of every example of reproduction of the problem.

C) allocations <= SMALL_MAXCLASS (SMALL_MAXCLASS==14*1024) get
   the failure (but bigger ones work, both fitting inside
   __je_tcache_maxclass and not). Again: every example
   reproduction of the problem has this status.

D) FreeBSD sometimes gets into a state where /etc/malloc.conf
   doing tcache:false does not seem to disable tcache. (Rebooting
   goes back to tcache:false working after such has been
   observed.) [Related or independent? I've no clue.] Usually
   tcache:false seems to work and so avoid the failures.

E) Use of POSIX_MADV_WILLNEED on the problematical allocation(s)
   in the child process after the fork but before the swap-outs
   of the child and parent prevents the failures (no read or
   write access to the memory from the child until after the
   swap-in). Doing so just in the parent process does not prevent
   the failures.

F) Similar to (E) but read-accessing a byte or more of one or
   more pages from the problematical allocations from the child
   process after the fork but before the swap-out makes those
   specific pages not fail. (The others still fail, if any.)
   Done from the parent process instead does not avoid the
   failures.

G) In a sequence like: su creates a sh which then runs one
   of my test programs that then forks off a child it can be
   that all of the 4 processes show the zeroed memory area
   like the child process does. su and sh need to have
   swapped-out and back in for them to get failures. su and
   sh die once they hit an assert that fails due to the zeroed
   memory page(s). The asserts involve addresses also messed
   up in the test program processes (parent and child).

In my reading I've not been able to determine what to expect
for fork-then-swap-out-and-back-in for pages that the child
process had not accessed yet but which might not be around
for later activity because of the parent process's own
swapped-out status at the time.

Note: While I usually run a non-debug kernel I've tried
a debug kernel and it provided no notices of problems. I
got no additional information from the attempt.

[My usual KERNCONF file includes GENERIC and then disables
various debug items.]

The bugzilla reports have example code for showing the
problems and various behaviors. The two in 217239 are
probably of more interest than the first one on 217138.

===
Mark Millard
markmi at dsl-only.net