arm64 vs. jemalloc and swapping in and out, sh/su examples: being swapped out leads to later Failed assertion: "tsd_booted" after being swapped in
Mark Millard
markmi at dsl-only.net
Tue Feb 28 03:21:49 UTC 2017
[I've added a variant of this material to bugzilla 217138.]
I've reduced the testing context to the following
type of example (no longer involving buildworld
buildkernel):
# sh
# sh
# sh
. . .
# sh
(So a bunch of nested sh's in an ssh session that
will have most swapped out. I happened to have
done this under 2 users for the example material
below: in one I did an su first. Plus I had another
session without such a nesting but with an su: this
is the one were I ran stress.)
After forcing these to mostly swap out (see below)
I used ^D or exit exit a currently interactive one.
Then the rest of the swapped out ones from the
nesting get the tsd_booted failure.
So, for example:
# stress -m 1 --vm-bytes 1536M
The context is a PINE64+ with 2GB of RAM. The above
was enough in my context to cause the needed swapouts:
# ps -aOuser,flags
PID USER F TT STAT TIME COMMAND
688 root 4102 u0 IWs 0:00.00 login [pam] (login)
689 root 10004002 u0 I+ 0:00.06 -sh (sh)
72611 markmi 4002 0 IWs 0:00.00 -sh (sh)
72613 markmi 4002 0 IW 0:00.00 sh
72614 markmi 4002 0 IW 0:00.00 sh
72615 markmi 4002 0 IW 0:00.00 sh
72616 markmi 4002 0 IW 0:00.00 sh
72617 markmi 4002 0 IW 0:00.00 sh
72618 markmi 4002 0 IW 0:00.00 sh
72619 markmi 4002 0 IW 0:00.00 sh
72620 markmi 4002 0 IW 0:00.00 sh
72621 markmi 4002 0 IW 0:00.00 sh
72622 markmi 4002 0 IW 0:00.00 sh
72623 markmi 4002 0 IW 0:00.00 sh
72624 markmi 4002 0 IW 0:00.00 sh
72625 markmi 4002 0 IW 0:00.00 sh
72626 markmi 10004002 0 I+ 0:00.01 sh
167 markmi 4002 1 IWs 0:00.00 -sh (sh)
169 root 4102 1 IW 0:00.00 su
170 root 4002 1 IW 0:00.00 su (sh)
171 root 4002 1 IW 0:00.00 sh
172 root 4002 1 IW 0:00.00 sh
173 root 4002 1 IW 0:00.00 sh
174 root 4002 1 IW 0:00.00 sh
175 root 4002 1 IW 0:00.00 sh
176 root 4002 1 IW 0:00.00 sh
177 root 4002 1 IW 0:00.00 sh
178 root 4002 1 IW 0:00.00 sh
179 root 10004002 1 I+ 0:00.01 sh
60961 root 4002 2 IW 0:00.00 stress -m 1 --vm-bytes 1536M
60962 root 10000002 2 R 0:29.41 stress -m 1 --vm-bytes 1536M
60964 root 10004002 2 R+ 0:00.01 ps -aOuser,flags
82389 markmi 4002 2 IWs 0:00.00 -sh (sh)
82391 root 4102 2 IW 0:00.00 su
82392 root 10004002 2 S 0:00.22 su (sh)
So with that swapped out context established:
(The markmi user case of nested sh's:)
$ ^D<jemalloc>: /usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed assertion: "tsd_booted"
<jemalloc>: /usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed assertion: "tsd_booted"
<jemalloc>: /usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed assertion: "tsd_booted"
<jemalloc>: /usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed assertion: "tsd_booted"
<jemalloc>: /usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed assertion: "tsd_booted"
<jemalloc>: /usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed assertion: "tsd_booted"
<jemalloc>: /usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed assertion: "tsd_booted"
<jemalloc>: /usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed assertion: "tsd_booted"
<jemalloc>: /usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed assertion: "tsd_booted"
<jemalloc>: /usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed assertion: "tsd_booted"
<jemalloc>: /usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed assertion: "tsd_booted"
<jemalloc>: /usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed assertion: "tsd_booted"
<jemalloc>: /usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed assertion: "tsd_booted"
<jemalloc>: /usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed assertion: "tsd_booted"
(The su'd user case of nested sh's:)
# ^D<jemalloc>: /usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed assertion: "tsd_booted"
<jemalloc>: /usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed assertion: "tsd_booted"
<jemalloc>: /usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed assertion: "tsd_booted"
<jemalloc>: /usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed assertion: "tsd_booted"
<jemalloc>: /usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed assertion: "tsd_booted"
<jemalloc>: /usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed assertion: "tsd_booted"
<jemalloc>: /usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed assertion: "tsd_booted"
<jemalloc>: /usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed assertion: "tsd_booted"
<jemalloc>: /usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed assertion: "tsd_booted"
<jemalloc>: /usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed assertion: "tsd_booted"
The one without the deeply nested sh's (but an su):
# exit
<jemalloc>: /usr/src/contrib/jemalloc/include/jemalloc/internal/tsd.h:687: Failed assertion: "tsd_booted"
So far I've not gotten such without having had the failing
processes having been swapped-out at some point before failure.
# ls -ltTU /var/crash/
total 589220
-rw------- 1 markmi wheel 4718592 Feb 27 18:31:00 2017 sh.82389.core
-rw------- 1 root wheel 4812800 Feb 27 18:30:59 2017 su.82391.core
-rw------- 1 markmi wheel 4718592 Feb 27 18:20:35 2017 sh.72611.core
-rw------- 1 markmi wheel 4718592 Feb 27 18:20:34 2017 sh.72613.core
-rw------- 1 markmi wheel 4718592 Feb 27 18:20:34 2017 sh.72614.core
-rw------- 1 markmi wheel 4718592 Feb 27 18:20:33 2017 sh.72615.core
-rw------- 1 markmi wheel 4718592 Feb 27 18:20:32 2017 sh.72616.core
-rw------- 1 markmi wheel 4718592 Feb 27 18:20:31 2017 sh.72617.core
-rw------- 1 markmi wheel 4718592 Feb 27 18:20:30 2017 sh.72618.core
-rw------- 1 markmi wheel 4718592 Feb 27 18:20:29 2017 sh.72619.core
-rw------- 1 markmi wheel 4718592 Feb 27 18:20:28 2017 sh.72620.core
-rw------- 1 markmi wheel 4718592 Feb 27 18:20:28 2017 sh.72621.core
-rw------- 1 markmi wheel 4718592 Feb 27 18:20:27 2017 sh.72622.core
-rw------- 1 markmi wheel 4718592 Feb 27 18:20:26 2017 sh.72623.core
-rw------- 1 markmi wheel 4718592 Feb 27 18:20:25 2017 sh.72624.core
-rw------- 1 markmi wheel 4718592 Feb 27 18:20:24 2017 sh.72625.core
-rw------- 1 markmi wheel 4718592 Feb 27 18:20:20 2017 sh.167.core
-rw------- 1 root wheel 4718592 Feb 27 18:20:19 2017 sh.170.core
-rw------- 1 root wheel 4718592 Feb 27 18:20:18 2017 sh.171.core
-rw------- 1 root wheel 4718592 Feb 27 18:20:17 2017 sh.172.core
-rw------- 1 root wheel 4718592 Feb 27 18:20:16 2017 sh.173.core
-rw------- 1 root wheel 4718592 Feb 27 18:20:16 2017 sh.174.core
-rw------- 1 root wheel 4718592 Feb 27 18:20:15 2017 sh.175.core
-rw------- 1 root wheel 4718592 Feb 27 18:20:14 2017 sh.176.core
-rw------- 1 root wheel 4718592 Feb 27 18:20:13 2017 sh.177.core
-rw------- 1 root wheel 4718592 Feb 27 18:20:12 2017 sh.178.core
. . . (from prior investigative activity) . . .
Note it was about 10 minutes after the two nested examples before I
existed the one that only had the su --at which point it detected
the Failed assertion: "tsd_booted" as well (su and sh, su first).
In other experiments I found that it was when buildworld activity
caused swapping out of the failing processes that there were the
later failures.
I can not claim swapping out and back in is the only context
with issues but it has been true of every such failure that
I've looked at since I made this discovery.
My context has a USB SSD for the root filesystem and the swap
parition.
===
Mark Millard
markmi at dsl-only.net
More information about the freebsd-stable
mailing list