Re: www/chromium will not build on a host w/ 8 CPU and 16G mem [RPi4B 8 GiByte example]

From: Mark Millard <marklmi_at_yahoo.com>
Date: Fri, 25 Aug 2023 09:21:33 UTC
On Aug 24, 2023, at 21:57, bob prohaska <fbsd@www.zefox.net> wrote:

> On Thu, Aug 24, 2023 at 03:20:50PM -0700, Mark Millard wrote:
>> bob prohaska <fbsd_at_www.zefox.net> wrote on
>> Date: Thu, 24 Aug 2023 19:44:17 UTC :
>> 
>>> On Fri, Aug 18, 2023 at 08:05:41AM +0200, Matthias Apitz wrote:
>>>> 
>>>> sysctl vfs.read_max=128
>>>> sysctl vfs.aio.max_buf_aio=8192
>>>> sysctl vfs.aio.max_aio_queue_per_proc=65536
>>>> sysctl vfs.aio.max_aio_per_proc=8192
>>>> sysctl vfs.aio.max_aio_queue=65536
>>>> sysctl vm.pageout_oom_seq=120
>>>> sysctl vm.pfault_oom_attempts=-1 
>>>> 
>>> 
>>> Just tried these settings on a Pi4, 8GB. Seemingly no help,
>>> build of www/chromium failed again, saying only:
>>> 
>>> ===> Compilation failed unexpectedly.
>>> Try to set MAKE_JOBS_UNSAFE=yes and rebuild before reporting the failure to
>>> the maintainer.
>>> *** Error code 1
>>> 
>>> No messages on the console at all, no indication of any swap use at all.
>>> If somebody can tell me how to invoke MAKE_JOBS_UNSAFE=yes, either
>>> locally or globally, I'll give it a try. But, if it's a system problem
>>> I'd expect at least a peep on the console....
>> 
>> Are you going to post the log file someplace? 
> 
> 
> http://nemesis.zefox.com/~bob/data/logs/bulk/main-default/2023-08-20_16h11m59s/logs/errors/chromium-115.0.5790.170_1.log
> 
>> You may have  missed an earlier message. 
> 
> Yes, I did. Some (very long) lines above there is:
> 
> [ 96% 53691/55361] "python3" "../../build/toolchain/gcc_link_wrapper.py" --output="./v8_context_snapshot_generator" -- c++ -fuse-ld=lld -Wl,--build-id=sha1 -fPIC -Wl,-z,noexecstack -Wl,-z,relro -Wl,-z,now -Wl,--icf=all -Wl,--color-diagnostics -Wl,--undefined-version -Wl,-mllvm,-enable-machine-outliner=never -no-canonical-prefixes -Wl,-O2 -Wl,--gc-sections -rdynamic -pie -Wl,--disable-new-dtags -Wl,--icf=none -L/usr/local/lib  -fstack-protector-strong -L/usr/local/lib  -o "./v8_context_snapshot_generator" -Wl,--start-group @"./v8_context_snapshot_generator.rsp"  -Wl,--end-group  -lpthread -lgmodule-2.0 -lglib-2.0 -lgobject-2.0 -lgthread-2.0 -lintl -licui18n -licuuc -licudata -lnss3 -lsmime3 -lnssutil3 -lplds4 -lplc4 -lnspr4 -ldl -lkvm -lexecinfo -lutil -levent -lgio-2.0 -ljpeg -lpng16 -lxml2 -lxslt -lexpat -lwebp -lwebpdemux -lwebpmux -lharfbuzz-subset -lharfbuzz -lfontconfig -lopus -lopenh264 -lm -lz -ldav1d -lX11 -lXcomposite -lXdamage -lXext -lXfixes -lXrender -lXrandr -lXtst -lepoll-shim -ldrm -lxcb -lxkbcommon -lgbm -lXi -lGL -lpci -lffi -ldbus-1 -lpangocairo-1.0 -lpango-1.0 -lcairo -latk-1.0 -latk-bridge-2.0 -lsndio -lFLAC -lsnappy -latspi 
> FAILED: v8_context_snapshot_generator 

That FAILED line is 64637.

> Then, a bit further down in the file a series of 
> d.lld: error: relocation R_AARCH64_ABS64 cannot be used against local symbol; recompile with -fPIC
> complaints.

The first R_AARCH64_ABS64 lines is 64339. After that are the next 2 lines, with:

defined in obj/third_party/ffmpeg/libffmpeg_internal.a(ffmpeg_internal/autorename_libavcodec_aarch64_fft_neon.o)

and:

referenced by ffmpeg_internal/autorename_libavcodec_aarch64_fft_neon.o:(fft_tab_neon) in archive obj/third_party/ffmpeg/libffmpeg_internal.a

> Unclear if the two kinds of complaints are related, nor whether they're the first..
> 
>> How long had it run before  stopping? 
> 
> 95 hours, give or take. Nothing about timeout was reported
> 
>> How does that match up with the MAX_EXECUTION_TIME
>> and NOHANG_TIME and the like that you have poudriere set
>> up to use ( /usr/local/etc/poudriere.conf ). 
> 
> NOHANG_TIME=44400
> MAX_EXECUTION_TIME=1728000
> MAX_EXECUTION_TIME_EXTRACT=144000
> MAX_EXECUTION_TIME_INSTALL=144000
> MAX_EXECUTION_TIME_PACKAGE=11728000
> Admittedly some are plain silly, I just started
> tacking on zeros after getting timeouts and being
> unable to match the error message and variable name..
> 
> I checked for duplicates this time, however.

Not stopped for time.

>> Something  relevant for the question is what you have for:
>> 
>> # Grep build logs to determine a possible build failure reason.  This is
>> # only shown on the web interface.
>> # Default: yes
>> DETERMINE_BUILD_FAILURE_REASON=no
>> 
>> Using DETERMINE_BUILD_FAILURE_REASON leads to large builds
>> running for a long time after it starts the process of
>> stopping from a timeout the grep activity takes a long
>> time and the build activity is not stopped during the
>> grep.
>> 
>> 
>> vm.pageout_oom_seq=120 and vm.pfault_oom_attempts=-1 make
>> sense to me for certain kinds of issues involved in large
>> builds, presuming sufficient RAM+SWAP for how it is set
>> up to operate. vm.pageout_oom_seq is associated with
>> console/log messages. if one runs out of RAM+SWAP,
>> vm.pfault_oom_attempts=-1 tends to lead to deadlock. But
>> it allows slow I/O to have the time to complete and so
>> can be useful.
>> 
>> I'm not sure that any vfs.aio.* is actually involved: special
>> system calls are involved, splitting requests vs. retrieving
>> the status of completed requests later. Use of aio has to be
>> explicit in the running software from what I can tell. I've
>> no information about which software builds might be using aio
>> during the build activity.
>> 
>> # sysctl -d vfs.aio
>> vfs.aio: Async IO management
>> vfs.aio.max_buf_aio: Maximum buf aio requests per process
>> vfs.aio.max_aio_queue_per_proc: Maximum queued aio requests per process
>> vfs.aio.max_aio_per_proc: Maximum active aio requests per process
>> vfs.aio.aiod_lifetime: Maximum lifetime for idle aiod
>> vfs.aio.num_unmapped_aio: Number of aio requests presently handled by unmapped I/O buffers
>> vfs.aio.num_buf_aio: Number of aio requests presently handled by the buf subsystem
>> vfs.aio.num_queue_count: Number of queued aio requests
>> vfs.aio.max_aio_queue: Maximum number of aio requests to queue, globally
>> vfs.aio.target_aio_procs: Preferred number of ready kernel processes for async IO
>> vfs.aio.num_aio_procs: Number of presently active kernel processes for async IO
>> vfs.aio.max_aio_procs: Maximum number of kernel processes to use for handling async IO 
>> vfs.aio.unsafe_warningcnt: Warnings that will be triggered upon failed IO requests on unsafe files
>> vfs.aio.enable_unsafe: Permit asynchronous IO on all file types, not just known-safe types
>> 
>> 
>> vfs.read_max may well change the disk access sequences:
>> 
>> # sysctl -d vfs.read_max
>> vfs.read_max: Cluster read-ahead max block count
>> 
>> That might well help some spinning rust or other types of
>> I/O.
> There don't seem to be any indications of disk speed being
> a problem, despite using "spinning rust" 8-)

Nope: R_AARCH64_ABS64 misuse is not a disk speed issue.

>> 
>> 
>> MAKE_JOBS_UNSAFE=yes is, for example, put in makefiles of
>> ports that have problems with parallel build activity. It
>> basically disables having parallel activity in the build
>> context involved. I've no clue if you use the likes of,
>> say,
>> 
> 
>> /usr/local/etc/poudriere.d/make.conf
>> 
>> with conditional logic inside such as use of notation
>> like:
>> 
>> .if ${.CURDIR:M*/www/chromium}
>> STUFF HERE
>> .endif
>> 
>> but you could.

The actual R_AARCH64_ABS64 use is in:

obj/third_party/ffmpeg/libffmpeg_internal.a(ffmpeg_internal/autorename_libavcodec_aarch64_fft_neon.o)

not directly in chromium. The solution is not clear to me.

> That wasn't needed when the Pi4 last compiled www/chromium.
> A Pi3 did benefit from tuning of that sort. 
> 
> It sounds like the sysctl settings were unlikely to be 
> a source of the trouble seen, if not actively helpful.

Yep, the sysctl's were not relevant.

> For the moment the machine is updating world and kernel.
> That should finish by tomorrow, at which point I'll try
> to add something like  
> 
> .if ${.CURDIR:M*/www/chromium}
> MAKE_JOBS_UNSAFE=yes
> .endif
> 
> to /usr/local/etc/poudriere.d/make.conf

That will not help avoid the R_AARCH64_ABS64 abuse,
unfortunately.


===
Mark Millard
marklmi at yahoo.com