fork-then-swap-out [zero RES(ident memory)] questions tied to arm64 failures (zeroed memory pages)
Mark Millard
markmi at dsl-only.net
Wed Mar 22 22:09:29 UTC 2017
The later questions are associated with:
Bugzilla 217239 and 217138 (which I now expect have a common cause)
https://lists.freebsd.org/pipermail/freebsd-arm/2017-March/015867.html
(and its thread)
These are tied to some process memory pages being trashed (to
be zero) in particular types of arm64 contexts. This is
reproducible in multiple arm64 contexts. The context is head
but I believe there are reports in the lists tied to 11 as
well.
[Unfortunately the above all very much shows a learn-as-I-go
property. Also the list has a sub-exchange on my testing other
devices to check for device failures that is not directly
relevant here.]
These are tied to problems with fork-then-swap-out-then-swap-in
contexts on arm64. (Even though I've occasionally typed amd64
accidentally in places in those materials.) Memory allocations
from before the fork are involved, ones not yet accessed by
the child side of the fork at the time of the fork.
fork sets up copy-on-write so that the child process temporarily
shares pages (those it does not write), or should.
But what if the parent process or both parent and child are
swapped-out just shortly after the fork (so, say, top -PCwaopid
shows zero for RES(ident memory)? What is the handling of, say,
the child swapping back in while the parent still is swapped
out?
I notice that the child can have a much smaller SWAP figure
than the parent so it would appear that the parent swap-out
has pages that the child does not.
So what if the child needs some of those pages? What should
happen? (Vs. what does happen on arm64 in specific types
of contexts? More below.)
I ask mostly to see if I can do more to give evidence of
what is going on and what to test for any proposed fix.
I'm not likely to find the underlying problem(s) for arm64
directly, unlike my investigation that lead to
fork-trampoline being fixed in head's -r313772
(2017-Feb-15).
[ https://lists.freebsd.org/pipermail/freebsd-arm/2017-February/015656.html
and its thread, including when its title changed in:
https://lists.freebsd.org/pipermail/freebsd-arm/2017-February/015678.html
.]
Part of that unlikely-to-solve status is because the
context seems to be bound to a lot of special conditions
and interesting behaviors simultaneously:
A) Both my original reproductions of problem reports on the
lists and the only (simple) programs for reproducing the
probablems involve fork-then-swap-out [zero RES(ident
memory)]. Neither fork by itself nor swap-out/in by
itself have been sufficient.
B) jemalloc's tcache being in use (__je_tcache_maxclass == 32*1024)
is part of every example of reproduction of the problem.
C) allocations <= SMALL_MAXCLASS (SMALL_MAXCLASS==14*1024) get
the failure (but bigger ones work, both fitting inside
__je_tcache_maxclass and not). Again: every example
reproduction of the problem has this status.
D) FreeBSD sometimes gets into a state where /etc/malloc.conf
doing tcache:false does not seem to disable tcache. (Rebooting
goes back to tcache:false working after such has been
observed.) [Related or independent? I've no clue.] Usually
tcache:false seems to work and so avoid the failures.
E) Use of POSIX_MADV_WILLNEED on the problematical allocation(s)
in the child process after the fork but before the swap-outs
of the child and parent prevents the failures (no read or
write access to the memory from the child until after the
swap-in). Doing so just in the parent process does not prevent
the failures.
F) Similar to (E) but read-accessing a byte or more of one or
more pages from the problematical allocations from the child
process after the fork but before the swap-out makes those
specific pages not fail. (The others still fail, if any.)
Done from the parent process instead does not avoid the
failures.
G) In a sequence like: su creates a sh which then runs one
of my test programs that then forks off a child it can be
that all of the 4 processes show the zeroed memory area
like the child process does. su and sh need to have
swapped-out and back in for them to get failures. su and
sh die once they hit an assert that fails due to the zeroed
memory page(s). The asserts involve addresses also messed
up in the test program processes (parent and child).
In my reading I've not been able to determine what to expect
for fork-then-swap-out-and-back-in for pages that the child
process had not accessed yet but which might not be around
for later activity because of the parent process's own
swapped-out status at the time.
Note: While I usually run a non-debug kernel I've tried
a debug kernel and it provided no notices of problems. I
got no additional information from the attempt.
[My usual KERNCONF file includes GENERIC and then disables
various debug items.]
The bugzilla reports have example code for showing the
problems and various behaviors. The two in 217239 are
probably of more interest than the first one on 217138.
===
Mark Millard
markmi at dsl-only.net
More information about the freebsd-arm
mailing list