Curious buildworld stoppage on RPi3B running 12.1-stable.
bob prohaska
fbsd at www.zefox.net
Fri May 8 18:03:01 UTC 2020
Here's an example of a curious hang on a Pi3 during -j4 buildworld.
The machine uses a 37GB mechanical USB disk running 12.1 r360689.
/boot/loader.conf contains
kern.cam.boot_delay="20000"
vm.pageout_oom_seq="4096"
vm.pfault_oom_attempts="-1"
Top reports:
last pid: 84778; load averages: 0.00, 0.47, 1.35 up 0+19:11:52 09:43:58
55 processes: 1 running, 54 sleeping
CPU: 0.0% user, 0.0% nice, 0.7% system, 0.0% interrupt, 99.3% idle
Mem: 243M Active, 26M Inact, 13M Laundry, 171M Wired, 98M Buf, 453M Free
Swap: 2263M Total, 698M Used, 1565M Free, 30% Inuse, 696K In
PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND
82615 root 1 20 0 726M 263M swread 3 8:43 0.46% c++
1025 bob 1 20 0 13M 1052K CPU0 0 2:47 0.16% top
1020 bob 1 20 0 19M 800K select 1 0:18 0.02% sshd
66244 root 1 20 0 11M 228K select 0 0:03 0.00% make
656 root 1 20 0 10M 244K select 0 0:00 0.00% devd
82405 root 1 20 0 17M 2460K select 0 0:25 0.00% make
27467 root 1 20 0 10M 228K select 0 0:11 0.00% make
27499 root 1 20 0 11M 232K select 0 0:07 0.00% make
40948 root 1 20 0 11M 228K select 0 0:06 0.00% make
928 root 1 20 0 16M 1004K select 0 0:05 0.00% sendmail
66255 root 1 20 0 14M 236K select 1 0:04 0.00% make
73291 root 1 20 0 10M 224K select 1 0:02 0.00% make
932 root 1 20 0 11M 340K nanslp 3 0:01 0.00% cron
727 root 1 20 0 11M 664K select 3 0:01 0.00% syslogd
The buildworld output log file has stopped updating:
-rw-r--r-- 1 root wheel 18178806 May 8 06:34 buildworld.log
bob at nemesis:/usr/src % ls -l *.log
-rw-r--r-- 1 root wheel 18345625 May 8 08:17 buildworld.log
bob at nemesis:/usr/src % ls -l *.log
-rw-r--r-- 1 root wheel 18532738 May 8 09:34 buildworld.log
bob at nemesis:/usr/src % ls -l *.log
-rw-r--r-- 1 root wheel 18532738 May 8 09:34 buildworld.log
bob at nemesis:/usr/src % ls -l *.log
-rw-r--r-- 1 root wheel 18532738 May 8 09:34 buildworld.log
bob at nemesis:/usr/src % ls -l *.log
-rw-r--r-- 1 root wheel 18532738 May 8 09:34 buildworld.log
bob at nemesis:/usr/src %
At this point clock time is 09:46, so it's been stuck 8 minutes.
Gstat reports
dT: 1.010s w: 1.000s
L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name
0 0 0 0 0.0 0 0 0.0 0.0| mmcsd0
1 80 80 321 12.4 0 0 0.0 99.4| da0
0 0 0 0 0.0 0 0 0.0 0.0| mmcsd0s1
0 0 0 0 0.0 0 0 0.0 0.0| mmcsd0s2
0 0 0 0 0.0 0 0 0.0 0.0| diskid/DISK-9B9A5304
0 0 0 0 0.0 0 0 0.0 0.0| msdosfs/MSDOSBOOT
0 0 0 0 0.0 0 0 0.0 0.0| mmcsd0s2a
0 0 0 0 0.0 0 0 0.0 0.0| mmcsd0s2b
0 0 0 0 0.0 0 0 0.0 0.0| diskid/DISK-9B9A5304s1
0 0 0 0 0.0 0 0 0.0 0.0| diskid/DISK-9B9A5304s2
0 0 0 0 0.0 0 0 0.0 0.0| ufsid/5bbd5a23da8b6195
0 0 0 0 0.0 0 0 0.0 0.0| ufs/rootfs
0 0 0 0 0.0 0 0 0.0 0.0| diskid/DISK-9B9A5304s2a
0 0 0 0 0.0 0 0 0.0 0.0| diskid/DISK-9B9A5304s2b
0 0 0 0 0.0 0 0 0.0 0.0| da0s1
1 80 80 321 12.4 0 0 0.0 99.5| da0s2
0 0 0 0 0.0 0 0 0.0 0.0| da0s2a
1 80 80 321 12.4 0 0 0.0 99.5| da0s2b
bob at nemesis:~ %
It looks as if the swap device is saturated, but swap is not exhausted.
The machine isn't hung; ssh sessions are responsive, the serial console
refreshes the login prompt when the Enter key is hit.
Also, there are no _current_ errors on the serial console. However, there
a quite a number of swap_pager_getswapspace(32): failed messages, which
stopped some considerable time earlier. Buildworld didn't seem affected by them.
Perhaps strangest of all, starting up a gstat script to log system
activity seems to slowly revive buildworld. %idle gradually drops to
a few percent. If the gstat script is started before buildworld
the stall doesn't seem to happen and buildworld finishes in about 27
hours, which is as good as it gets.
Earlier tests letting the machine run undisturbed without the gastat
script running suggests it'll eventually finish buildworld in around
36 hours.
This vaguely echoes an incident long ago where Xorg was unusably slow
until a small job was started in the background, at which point it
became responsive. Can't readily imagine a connection, though.
It's rather counterintituve that giving the machine more work to do
should make it work faster....If anybody can suggest what's going
on I'd most curious. I could try poking around in the debugger if
someone can offer guidance. On my own I'm clueless.
Thanks for reading,
bob prohaska
More information about the freebsd-arm
mailing list