From nobody Tue Jan 25 22:17:53 2022 X-Original-To: freebsd-arm@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id B8CEC1989FBC for ; Tue, 25 Jan 2022 22:17:54 +0000 (UTC) (envelope-from fbsd@www.zefox.net) Received: from www.zefox.net (www.zefox.net [50.1.20.27]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "www.zefox.com", Issuer "www.zefox.com" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 4Jk1Sd4CTlz3mRd for ; Tue, 25 Jan 2022 22:17:53 +0000 (UTC) (envelope-from fbsd@www.zefox.net) Received: from www.zefox.net (localhost [127.0.0.1]) by www.zefox.net (8.16.1/8.15.2) with ESMTPS id 20PMHrUq044920 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO); Tue, 25 Jan 2022 14:17:53 -0800 (PST) (envelope-from fbsd@www.zefox.net) Received: (from fbsd@localhost) by www.zefox.net (8.16.1/8.15.2/Submit) id 20PMHrxS044919; Tue, 25 Jan 2022 14:17:53 -0800 (PST) (envelope-from fbsd) Date: Tue, 25 Jan 2022 14:17:53 -0800 From: bob prohaska To: Mark Millard Cc: Free BSD Subject: Re: Troubles building world on stable/13 Message-ID: <20220125221753.GA44654@www.zefox.net> References: <8595CFBD-DC65-4472-A0A1-8A7BE1C031D6@yahoo.com> <20220124165449.GA39982@www.zefox.net> <5FAC2B2C-7740-435E-A183-FB3EF1FCE7F9@yahoo.com> <1CB4EDCD-0998-4363-8CEA-14854EB76FA3@yahoo.com> <20220125162245.GA43635@www.zefox.net> <61A3CF79-552C-4884-A8EA-85003B249856@yahoo.com> <20220125180823.GB43635@www.zefox.net> <35046946-7FE4-4E44-950F-BF9CCA72D8F0@yahoo.com> List-Id: Porting FreeBSD to ARM processors List-Archive: https://lists.freebsd.org/archives/freebsd-arm List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-arm@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <35046946-7FE4-4E44-950F-BF9CCA72D8F0@yahoo.com> X-Rspamd-Queue-Id: 4Jk1Sd4CTlz3mRd X-Spamd-Bar: - Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=none; spf=none (mx1.freebsd.org: domain of fbsd@www.zefox.net has no SPF policy when checking 50.1.20.27) smtp.mailfrom=fbsd@www.zefox.net X-Spamd-Result: default: False [-1.10 / 15.00]; RCVD_TLS_ALL(0.00)[]; ARC_NA(0.00)[]; WWW_DOT_DOMAIN(0.50)[]; MID_RHS_MATCH_FROM(0.00)[]; FROM_HAS_DN(0.00)[]; R_SPF_NA(0.00)[no SPF record]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; NEURAL_HAM_LONG(-1.00)[-0.999]; MIME_GOOD(-0.10)[text/plain]; DMARC_NA(0.00)[zefox.net]; AUTH_NA(1.00)[]; TO_MATCH_ENVRCPT_SOME(0.00)[]; TO_DN_ALL(0.00)[]; NEURAL_HAM_SHORT(-1.00)[-1.000]; RCPT_COUNT_TWO(0.00)[2]; MLMMJ_DEST(0.00)[freebsd-arm]; FREEMAIL_TO(0.00)[yahoo.com]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; ASN(0.00)[asn:7065, ipnet:50.1.16.0/20, country:US]; RCVD_COUNT_TWO(0.00)[2]; MID_RHS_WWW(0.50)[] X-ThisMailContainsUnwantedMimeParts: N On Tue, Jan 25, 2022 at 12:49:02PM -0800, Mark Millard wrote: > On 2022-Jan-25, at 10:08, bob prohaska wrote: > > > On Tue, Jan 25, 2022 at 09:13:08AM -0800, Mark Millard wrote: > >> > >> -DBATCH ? I'm not aware of there being any use of that symbol. > >> Do you have a documentation reference for it so that I could > >> read about it? > >> > > It's a switch to turn off dialog4ports. I can't find the reference > > now. Perhaps it's been deprecated? A name like -DUSE_DEFAULTS would > > be easier to understand anyway. > > I've never had buildworld buildkernel or the like try to use > dialog4ports. I've only had port building use it. buildworld > and buildkernel can be done with no ports installed at all. > dialog4ports is a port. > The attempt to build devel/llvm13 under stable/13 was done under ports. Thus the -DBATCH, to avoid manual intervention. > I think -DBATCH was ignored for the activity at hand. > > > On a whim, I tried building devel/llvm13 on a Pi4 running -current with > > 8 GB of RAM and 8 GB of swap. To my surprise, that stopped with: > > nemesis.zefox.com kernel log messages: > > +FreeBSD 14.0-CURRENT #26 main-5025e85013: Sun Jan 23 17:25:31 PST 2022 > > +swap_pager: indefinite wait buffer: bufobj: 0, blkno: 1873450, size: 4096 > > +swap_pager: indefinite wait buffer: bufobj: 0, blkno: 521393, size: 4096 > > +swap_pager: indefinite wait buffer: bufobj: 0, blkno: 209826, size: 12288 > > +swap_pager: indefinite wait buffer: bufobj: 0, blkno: 1717218, size: 24576 > > +pid 56508 (c++), jid 0, uid 0, was killed: failed to reclaim memory > > > > On an 8GB machine, that seems strange. > > -j build? -j4 ? > Since this too was a port build, I let ports decide. It settled on 4. > Were you watching the swap usage in top (or some such)? > Top was running but the failure happened overnight. Not expecting it to fail, I didn't keep a log of swapping activity. The message above was in the next morning's log email. > Note: The "was killed" related notices have been improved > in main, but there is a misnomer case about "out of swap" > (last I checked). > > An environment that gets "swap_pager: indefinite wait buffer" > notices is problematical and the I/O delays for the virtual > memory subsystem can lead to kills, if I understand right. > > But, if I remember right, the actual message for a directly > I/O related kill is now different. > In this case the message was "unable to reclaim memory", a message I've not seen before. > I think that being able to reproduce this case could be > important. I probably can not because I'd not get the > "swap_pager: indefinite wait buffer" in my hardware > context. > If it's relevant, the case of /usr/ports/devel/llvm13 seems like the most expedient test, since it did fail with realistic amounts of memory and swap. I gather that there's a certain amount of self-recompilation in buildworld, is that true of the port version? Does it matter? > > Per the failure message I restarted the build of devel/llvm13 with > > make -DBATCH MAKE_JOBS_UNSAFE=YES > make.log & > > Just like -DBATCH is for ports, not buildworld buildkernel, > MAKE_JOBS_UNSAFE= is for ports, not buildworld buildkernel, > at least if I understand right. > This was a ports build on the Pi4. The restart is running single-thread and quite slow, I'm tempted to stop it unless a failure would be useful. > > >>> However, restarting buildworld using -j1 appears to have worked past > >>> the former point of failure. > >> [this on stable/13 pi3] > >> Hmm. That usually means one (or both) of two things was involved > >> in the failure: > >> > >> A) a build race where something is not (fully) ready when > >> it is used > >> > >> B) running out of resources, such as RAM+SWAP > >> > > > > The stable/13 machine is short of swap; it has only 2 GB, which > > used to be enough. > > So RAM+SWAP is 1 GiByte + 2 GiByte, so 3 GiByte on that > RPi3*? (That would have been good to know earlier, such > as for my attempts at reproduction.) > Correct, 3GB RAM+swap. Didn't realize it would turn out to be important, sorry! > -j for the RPi3* when it was failing? > -j4, but I think it also failed at -j2. > Did you havae failures with the .cpp and .sh (so no > make use involved) in the RAM+SWAP context? > Using the .cpp and .sh file on a Pi3 with 2 GB swap running stable/13 there was a consistent failure. Using the .cpp and .sh files on a Pi3 with 7GB swap there was no failure. Using a build of /usr/ports/devel/llvm13 as a test the build failed even with 8 GB of RAM and 8 GB of swap. > > Maybe that's the problem, but having an error > > report that says it's a segfault is a confusing diagnostic. > > > >> But, as I understand, you were able to use a .cpp and > >> .sh file pair that had been produced to repeat the > >> problem on the RPi3B --and that would not have been a > >> parallel-activity context. > >> > > > > To be clear, the reproduction was on the same stable/13 that > > reported the original failure. An attempt at reproduction > > on a different Pi3 running -current ran without any errors. > > Come to think of it, that machine had more swap, too. > > How much swap? > Two swap partitions, 3.6 GB and 4 GB, both in use. > > At this point, I expect that the failure was tied to the > RAM+SWAP totaling to 3 GiBytes. > That seems likely, or at least a reasonable suspicion. > Knowing that context we might have a reproducible report > that can be made based on the .cpp and .sh files, where > restricting the RAM+SWAP use allowed is part of the > report. > There seem to be some other reports of clang using unreasonable amounts of memory, for example https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=261341 A much older report that looks vaguely similar (out of memory reported as segfault) https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=172576 It's not arm-related and dates from 2012 but is still open. I'll try to repeat some of the tests using the logging script used previously. Right now it contains: #!/bin/sh while true sysctl hw.regulator.5v0.min_uvolt ; do vmstat ; gstat -abd -I 10s ; date ; swapinfo ; tail \ -n 2 /var/log/messages ; netstat -m | grep "mbuf clusters" ; ps -auxd -w -w done Changes to the script are welcome, the output is voluminous. Thanks for reading! bob prohaska