FYI: aarch64 main [so: 14] system hung up with a large amount of memory in use (given the RAM+SWAP configuration) but lots of swap left
Date: Sat, 13 Nov 2021 11:20:55 UTC
While attempting to see if I could repeat a bugzilla report in a somewhat different context, I has the system hang up to the point that ^C and ^Z did not work and ^T did not echo out what would be expected for poudriere (or even the kernel backtrace). I was able to escape to ddb. The context was Cortex-A72 based aarch64 system using: # poudriere jail -jmain-CA7 -i Jail name: main-CA7 Jail version: 14.0-CURRENT Jail arch: arm.armv7 Jail method: null Jail mount: /usr/obj/DESTDIRs/main-CA7-poud Jail fs: Jail updated: 2021-06-27 17:58:33 Jail pkgbase: disabled # uname -apKU FreeBSD CA72_16Gp_ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #18 main-n250455-890cae197737-dirty: Thu Nov 4 13:43:17 PDT 2021 root@CA72_16Gp_ZFS:/usr/obj/BUILDs/main-CA72-nodbg-clang/usr/main-src/arm64.aarch64/sys/GENERIC-NODBG-CA72 arm64 aarch64 1400040 1400040 It is a non-debug build (but with symbols). 16 cortex-A72 cores, 64 GiBytes RAM, root on ZFS, 251904Mi swap, USE_TMPFS=all in use. ALLOW_PARALLEL_JOBS= in use too. (Mentioned only for context: I've no specific evidence if other contexts would also have failed, say, USE+TMPFS="data" or UFS.) When I looked around at the db> prompts I noticed one oddity (I'm no expert at such inspections): db> show allchains . . . chain 92: thread 100671 (pid 15928, make) is blocked on lockmgr 0%A%0EXCL thread 100671 (pid 15928, make) is blocked on lockmgr 0%A%0EXCL thread 100671 (pid 15928, make) is blocked on lockmgr 0%A%0EXCL thread 100671 (pid 15928, make) is blocked on lockmgr 0%A%0EXCL thread 100671 (pid 15928, make) is blocked on lockmgr 0%A%0EXCL thread 100671 (pid 15928, make) is blocked on lockmgr 0%A%0EXCL . . . (thousands of more instances of that line content, I never found the last) . . . My patched top (that reports some "maximum observed" (MaxObs???) figures) was showing (having hung up with the system): last pid: 18816; load averages: 10.11, 16.76, 18.73 MaxObs: 115.65, 103.13, 96.36 up 8+06:52:04 20:30:57 324 threads: 17 running, 305 sleeping, 2 waiting, 147 MaxObsRunning CPU: 2.8% user, 0.0% nice, 97.1% system, 0.0% interrupt, 0.0% idle Mem: 19044Ki Active, 331776B Inact, 73728B Laundry, 6950Mi Wired, 69632B Buf, 558860Ki Free, 47709Mi MaxObsActive, 12556Mi MaxObsWired, 59622Mi MaxObs(Act+Wir+Lndry) ARC: 2005Mi Total, 623319Ki MFU, 654020Ki MRU, 2048Ki Anon, 27462Ki Header, 745685Ki Other 783741Ki Compressed, 3981Mi Uncompressed, 5.20:1 Ratio Swap: 251904Mi Total, 101719Mi Used, 150185Mi Free, 40% Inuse, 3432Ki In, 3064Ki Out, 101719Mi MaxObsUsed, 101737Mi MaxObs(Act+Lndry+SwapUsed), 109816Mi MaxObs(Act+Wir+Lndry+SwapUsed) (Based on the 20:30:57 time shown, it had been hung up for over 2 hours when I got to it.) There were no console messages. /var/log/messages had its last message at 18:57:52. No out-of-swap or such messages. I did get a dump via the db> prompt. === Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar)