Re: www/chromium will not build on a host w/ 8 CPU and 16G mem [RPi4B 8 GiByte example]

From: bob prohaska <fbsd_at_www.zefox.net>
Date: Fri, 25 Aug 2023 04:57:42 UTC
On Thu, Aug 24, 2023 at 03:20:50PM -0700, Mark Millard wrote:
> bob prohaska <fbsd_at_www.zefox.net> wrote on
> Date: Thu, 24 Aug 2023 19:44:17 UTC :
> 
> > On Fri, Aug 18, 2023 at 08:05:41AM +0200, Matthias Apitz wrote:
> > > 
> > > sysctl vfs.read_max=128
> > > sysctl vfs.aio.max_buf_aio=8192
> > > sysctl vfs.aio.max_aio_queue_per_proc=65536
> > > sysctl vfs.aio.max_aio_per_proc=8192
> > > sysctl vfs.aio.max_aio_queue=65536
> > > sysctl vm.pageout_oom_seq=120
> > > sysctl vm.pfault_oom_attempts=-1 
> > > 
> > 
> > Just tried these settings on a Pi4, 8GB. Seemingly no help,
> > build of www/chromium failed again, saying only:
> > 
> > ===> Compilation failed unexpectedly.
> > Try to set MAKE_JOBS_UNSAFE=yes and rebuild before reporting the failure to
> > the maintainer.
> > *** Error code 1
> > 
> > No messages on the console at all, no indication of any swap use at all.
> > If somebody can tell me how to invoke MAKE_JOBS_UNSAFE=yes, either
> > locally or globally, I'll give it a try. But, if it's a system problem
> > I'd expect at least a peep on the console....
> 
> Are you going to post the log file someplace? 


http://nemesis.zefox.com/~bob/data/logs/bulk/main-default/2023-08-20_16h11m59s/logs/errors/chromium-115.0.5790.170_1.log

> You may have  missed an earlier message. 

Yes, I did. Some (very long) lines above there is:

[ 96% 53691/55361] "python3" "../../build/toolchain/gcc_link_wrapper.py" --output="./v8_context_snapshot_generator" -- c++ -fuse-ld=lld -Wl,--build-id=sha1 -fPIC -Wl,-z,noexecstack -Wl,-z,relro -Wl,-z,now -Wl,--icf=all -Wl,--color-diagnostics -Wl,--undefined-version -Wl,-mllvm,-enable-machine-outliner=never -no-canonical-prefixes -Wl,-O2 -Wl,--gc-sections -rdynamic -pie -Wl,--disable-new-dtags -Wl,--icf=none -L/usr/local/lib  -fstack-protector-strong -L/usr/local/lib  -o "./v8_context_snapshot_generator" -Wl,--start-group @"./v8_context_snapshot_generator.rsp"  -Wl,--end-group  -lpthread -lgmodule-2.0 -lglib-2.0 -lgobject-2.0 -lgthread-2.0 -lintl -licui18n -licuuc -licudata -lnss3 -lsmime3 -lnssutil3 -lplds4 -lplc4 -lnspr4 -ldl -lkvm -lexecinfo -lutil -levent -lgio-2.0 -ljpeg -lpng16 -lxml2 -lxslt -lexpat -lwebp -lwebpdemux -lwebpmux -lharfbuzz-subset -lharfbuzz -lfontconfig -lopus -lopenh264 -lm -lz -ldav1d -lX11 -lXcomposite -lXdamage -lXext -lXfixes -lXrender -lXrandr -lXtst -lepoll-shim -ldrm -lxcb -lxkbcommon -lgbm -lXi -lGL -lpci -lffi -ldbus-1 -lpangocairo-1.0 -lpango-1.0 -lcairo -latk-1.0 -latk-bridge-2.0 -lsndio -lFLAC -lsnappy -latspi 
FAILED: v8_context_snapshot_generator 

Then, a bit further down in the file a series of 
d.lld: error: relocation R_AARCH64_ABS64 cannot be used against local symbol; recompile with -fPIC
complaints.

Unclear if the two kinds of complaints are related, nor whether they're the first..

> How long had it run before  stopping? 

95 hours, give or take. Nothing about timeout was reported

> How does that match up with the MAX_EXECUTION_TIME
> and NOHANG_TIME and the like that you have poudriere set
> up to use ( /usr/local/etc/poudriere.conf ). 

NOHANG_TIME=44400
MAX_EXECUTION_TIME=1728000
MAX_EXECUTION_TIME_EXTRACT=144000
MAX_EXECUTION_TIME_INSTALL=144000
MAX_EXECUTION_TIME_PACKAGE=11728000
Admittedly some are plain silly, I just started
tacking on zeros after getting timeouts and being
unable to match the error message and variable name..

I checked for duplicates this time, however.

> Something  relevant for the question is what you have for:
> 
> # Grep build logs to determine a possible build failure reason.  This is
> # only shown on the web interface.
> # Default: yes
> DETERMINE_BUILD_FAILURE_REASON=no
> 
> Using DETERMINE_BUILD_FAILURE_REASON leads to large builds
> running for a long time after it starts the process of
> stopping from a timeout the grep activity takes a long
> time and the build activity is not stopped during the
> grep.
> 
> 
> vm.pageout_oom_seq=120 and vm.pfault_oom_attempts=-1 make
> sense to me for certain kinds of issues involved in large
> builds, presuming sufficient RAM+SWAP for how it is set
> up to operate. vm.pageout_oom_seq is associated with
> console/log messages. if one runs out of RAM+SWAP,
> vm.pfault_oom_attempts=-1 tends to lead to deadlock. But
> it allows slow I/O to have the time to complete and so
> can be useful.
> 
> I'm not sure that any vfs.aio.* is actually involved: special
> system calls are involved, splitting requests vs. retrieving
> the status of completed requests later. Use of aio has to be
> explicit in the running software from what I can tell. I've
> no information about which software builds might be using aio
> during the build activity.
> 
> # sysctl -d vfs.aio
> vfs.aio: Async IO management
> vfs.aio.max_buf_aio: Maximum buf aio requests per process
> vfs.aio.max_aio_queue_per_proc: Maximum queued aio requests per process
> vfs.aio.max_aio_per_proc: Maximum active aio requests per process
> vfs.aio.aiod_lifetime: Maximum lifetime for idle aiod
> vfs.aio.num_unmapped_aio: Number of aio requests presently handled by unmapped I/O buffers
> vfs.aio.num_buf_aio: Number of aio requests presently handled by the buf subsystem
> vfs.aio.num_queue_count: Number of queued aio requests
> vfs.aio.max_aio_queue: Maximum number of aio requests to queue, globally
> vfs.aio.target_aio_procs: Preferred number of ready kernel processes for async IO
> vfs.aio.num_aio_procs: Number of presently active kernel processes for async IO
> vfs.aio.max_aio_procs: Maximum number of kernel processes to use for handling async IO 
> vfs.aio.unsafe_warningcnt: Warnings that will be triggered upon failed IO requests on unsafe files
> vfs.aio.enable_unsafe: Permit asynchronous IO on all file types, not just known-safe types
> 
> 
> vfs.read_max may well change the disk access sequences:
> 
> # sysctl -d vfs.read_max
> vfs.read_max: Cluster read-ahead max block count
> 
> That might well help some spinning rust or other types of
> I/O.
There don't seem to be any indications of disk speed being
a problem, despite using "spinning rust" 8-)

> 
> 
> MAKE_JOBS_UNSAFE=yes is, for example, put in makefiles of
> ports that have problems with parallel build activity. It
> basically disables having parallel activity in the build
> context involved. I've no clue if you use the likes of,
> say,
>
 
> /usr/local/etc/poudriere.d/make.conf
> 
> with conditional logic inside such as use of notation
> like:
> 
> .if ${.CURDIR:M*/www/chromium}
> STUFF HERE
> .endif
> 
> but you could.

That wasn't needed when the Pi4 last compiled www/chromium.
A Pi3 did benefit from tuning of that sort. 

It sounds like the sysctl settings were unlikely to be 
a source of the trouble seen, if not actively helpful.

For the moment the machine is updating world and kernel.
That should finish by tomorrow, at which point I'll try
to add something like  

 .if ${.CURDIR:M*/www/chromium}
MAKE_JOBS_UNSAFE=yes
 .endif

to /usr/local/etc/poudriere.d/make.conf


Thanks very much for writing.

bob prohaska