Re: FYI: aarch64 main [so: 14] system hung up with a large amount of memory in use (given the RAM+SWAP configuration) but lots of swap left

From: Mark Millard via freebsd-current <freebsd-current_at_freebsd.org>
Date: Sat, 13 Nov 2021 11:40:06 UTC

On 2021-Nov-13, at 03:20, Mark Millard <marklmi at yahoo.com> wrote:


> While attempting to see if I could repeat a bugzilla report in a
> somewhat different context, I has the system hang up to the
> point that ^C and ^Z did not work and ^T did not echo out what
> would be expected for poudriere (or even the kernel backtrace).
> I was able to escape to ddb.
> 
> The context was Cortex-A72 based aarch64 system using:
> 
> # poudriere jail -jmain-CA7 -i
> Jail name:         main-CA7
> Jail version:      14.0-CURRENT
> Jail arch:         arm.armv7
> Jail method:       null
> Jail mount:        /usr/obj/DESTDIRs/main-CA7-poud
> Jail fs:           
> Jail updated:      2021-06-27 17:58:33
> Jail pkgbase:      disabled
> 
> # uname -apKU
> FreeBSD CA72_16Gp_ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #18 main-n250455-890cae197737-dirty: Thu Nov  4 13:43:17 PDT 2021     root@CA72_16Gp_ZFS:/usr/obj/BUILDs/main-CA72-nodbg-clang/usr/main-src/arm64.aarch64/sys/GENERIC-NODBG-CA72  arm64 aarch64 1400040 1400040
> 
> It is a non-debug build (but with symbols).
> 
> 16 cortex-A72 cores, 64 GiBytes RAM, root on ZFS, 251904Mi swap,
> USE_TMPFS=all in use. ALLOW_PARALLEL_JOBS= in use too.
> (Mentioned only for context: I've no specific evidence if other
> contexts would also have failed, say, USE+TMPFS="data" or UFS.)
> 
> When I looked around at the db> prompts I noticed one
> oddity (I'm no expert at such inspections):
> 
> db> show allchains
> . . .
> chain 92:
> thread 100671 (pid 15928, make) is blocked on lockmgr 0%A%0EXCL
> thread 100671 (pid 15928, make) is blocked on lockmgr 0%A%0EXCL
> thread 100671 (pid 15928, make) is blocked on lockmgr 0%A%0EXCL
> thread 100671 (pid 15928, make) is blocked on lockmgr 0%A%0EXCL
> thread 100671 (pid 15928, make) is blocked on lockmgr 0%A%0EXCL
> thread 100671 (pid 15928, make) is blocked on lockmgr 0%A%0EXCL
> . . . (thousands of more instances of that line content,
>       I never found the last) . . .
> 
> My patched top (that reports some "maximum observed" (MaxObs???)
> figures) was showing (having hung up with the system):
> 
> last pid: 18816;  load averages: 10.11, 16.76, 18.73 MaxObs: 115.65, 103.13, 96.36                                                                                              up 8+06:52:04  20:30:57
> 324 threads:   17 running, 305 sleeping, 2 waiting, 147 MaxObsRunning
> CPU:  2.8% user,  0.0% nice, 97.1% system,  0.0% interrupt,  0.0% idle
> Mem: 19044Ki Active, 331776B Inact, 73728B Laundry, 6950Mi Wired, 69632B Buf, 558860Ki Free, 47709Mi MaxObsActive, 12556Mi MaxObsWired, 59622Mi MaxObs(Act+Wir+Lndry)
> ARC: 2005Mi Total, 623319Ki MFU, 654020Ki MRU, 2048Ki Anon, 27462Ki Header, 745685Ki Other
>     783741Ki Compressed, 3981Mi Uncompressed, 5.20:1 Ratio
> Swap: 251904Mi Total, 101719Mi Used, 150185Mi Free, 40% Inuse, 3432Ki In, 3064Ki Out, 101719Mi MaxObsUsed, 101737Mi MaxObs(Act+Lndry+SwapUsed), 109816Mi MaxObs(Act+Wir+Lndry+SwapUsed)
> 
> (Based on the 20:30:57 time shown, it had been hung up for over
> 2 hours when I got to it.)
> 
> There were no console messages. /var/log/messages had its
> last message at 18:57:52. No out-of-swap or such
> messages.
> 
> 
> I did get a dump via the db> prompt.
> 

In retrying the poudriere-devel run expiriment I'm
getting various builds that are generating
multi-GiByte log files (and growing) that have
lines like:

thread 'rustc' panicked at 'capacity overflow', library/alloc/src/raw_vec.rs:559:5
stack backtrace:
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

error: internal compiler error: unexpected panic

note: the compiler unexpectedly panicked. this is a bug.

note: we would appreciate a bug report: 
https://github.com/rust-lang/rust/issues/new?labels=C-bug%2C+I-ICE%2C+T-compiler&template=ice.md


note: rustc 1.55.0 running on armv7-unknown-freebsd

note: compiler flags: -C embed-bitcode=no -C debuginfo=2 -C linker=cc --crate-type lib

note: some of the compiler flags provided by cargo are hidden

query stack during panic:
#0 [trimmed_def_paths] calculating trimmed def paths
#1 [lint_mod] linting module `transitions`
#2 [analysis] running analysis passes on this crate
end of query stack
thread 'rustc' panicked at 'cannot panic during the backtrace function', library/std/src/../../backtrace/src/lib.rs:147:13
stack backtrace:
   0: 0x4710076c - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h4428caffcb182c5b
   1: 0x471c9d00 - core::fmt::write::h91f4a7678561fd61
   2: 0x470e2180 - <unknown>
   3: 0x470ebd40 - <unknown>
   4: 0x470eb824 - <unknown>
   5: 0x41ed4848 - <unknown>
   6: 0x470ec690 - std::panicking::rust_panic_with_hook::h6bc4b7e83060df25
   7: 0x47100f0c - <unknown>
   8: 0x47100900 - <unknown>
   9: 0x470ec374 - <unknown>
. . .
  65: 0x470ee71c - <unknown>
  66: 0x401361bc - <unknown>
  67: 0x40135cd8 - pthread_create
  68: 0x40138b9c - pthread_peekjoin_np
  69: 0x40138b9c - pthread_peekjoin_np
  70: 0x40138b9c - pthread_peekjoin_np
  71: 0x40138b9c - pthread_peekjoin_np
  72: 0x40138b9c - pthread_peekjoin_np
  73: 0x40138b9c - pthread_peekjoin_np
. . . massive repitition of pthread_peekjoin_np lines . . .

(I used USE_TMPFS="data" to avoid tmpfs memory usage this
time.)


===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)