From nobody Mon Aug 05 09:09:24 2024 X-Original-To: freebsd-arm@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4WcrH7357Hz5SslF for ; Mon, 05 Aug 2024 09:09:39 +0000 (UTC) (envelope-from marklmi@yahoo.com) Received: from sonic312-23.consmr.mail.gq1.yahoo.com (sonic312-23.consmr.mail.gq1.yahoo.com [98.137.69.204]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4WcrH65W6Vz4tRh for ; Mon, 5 Aug 2024 09:09:38 +0000 (UTC) (envelope-from marklmi@yahoo.com) Authentication-Results: mx1.freebsd.org; none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1722848977; bh=rIp7e1CQsdmXZkXTWmZe7lXHcO6OV/sk+23xKOO6Owg=; h=Subject:From:In-Reply-To:Date:Cc:References:To:From:Subject:Reply-To; b=hASqOE1gP5uqWOpM2e/nTzS4U+rPdbVtvFSICRlZR186xlfkuM7r2Mf3z+zM3sJ+gXujjSw7Q1tSe0FNGyqWHz0LxwjPkvSZV2QNIjD1vjKaPTEjQmUC5WUr20ISVJwq+MABgHJsQ1jbkO6UE1RKFfYrpZKbkWU2GIgt9gUOQ7AvFiVlSLRRn/Gccg9ylwmFTo51My5SKxOdQFhxMF6mkqc8wgA72TnIQJh4zrK+otgQFOYdijG/1L+ulFuCLy33Tc0VC5el+feR1uwk3Kt/Vt0bPTyc9N8pTfuy0+WDdsNAXc/jeSjqpX/CsgsQjj5sY6es6mpsJs++JXeJRSjIQA== X-SONIC-DKIM-SIGN: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1722848977; bh=HGjs0DnDMc7QmMrFNHdV/G7HURHXBQG3NjbHj7AA1Xn=; h=X-Sonic-MF:Subject:From:Date:To:From:Subject; b=nAYCMZOElMRW+Uz8iPqR4iYREPtZxcBPX3TkPKf5qM5lHHnPXhEdpYvhCURPscnf7B45Ae3fy+gvuspse7gMQibJQE/MhM0Nhkm5KT+UTYv3fpk14VbaPQM2DrEmjdCmKEhq8vlFbf/BoJjSsgvW5tPqo2ly6dWLEuKvRLc2eS4il/pvsD1qNAeDJ9p0zysC/QB09Cq4lUNCMecJp8KRHP2BD8QYEsf2DCy3vgrME+EL2dq1m3DkDpD8pxrQo5iMOlBNanFV75sIv2Ls7ElAq6BP4BTuNo+uuLg1VvoD4EjlutU6fBOe1HcTvjl45LzktOhLDqhFZ9Lc0uD1KbvoGA== X-YMail-OSG: Dl074xcVM1lWkz6AxrPmfWvatAD1YgOjB27iWT1ELEKADeD_QdIN15sBZlu3jmx T3p15oUUaN1GZzEZeAfmu9ETf2ccVBubLDUYZl7g9soVP_o5lRDVsxQb2PAPYFdLyZQGuwPyfUvX IXGfe9HjLQdik2XRZYrkormx7B1O9rp.fqU9Sqm8Sx8fPIbeYrSc8nGwcfdtwQoWnrH1YiUltkF6 bT_AWBuvgs1kXpcYTgc0SFQOCKh7WXan4wAPiT9WVmcCb.yoz9vRKsz_ktHhTd67rIPk2roYQXyd xi8ij1sNSf52kYxqgBejHb9dJbedGZwTyMS1mjxUZqr2qixeMn5ACVsMDm_fFquJXjRXdlCK9Wa4 NjFwHgxAI855mnrXaDJnTi38iwzt270DsQWq..muK_yGjFGXtumsRAVdbdVYb7ttQNh9qnHZFLSV om47_ubgH4_VmFeY65ijQjOKLioVYL2U5gRP66PGPt9P2C22OZ2cx1ZCG9JVWVLT7X9akFJsazoP Jw4J7GiIclwzjraGOkSnoG7iyjqFgz4.HKCWbh7ih_Z9RWbZK5qYUv.T36ETQ9mm4BNfQHrzbJZV IP4CK5Wvcm9TfZYe.Vu6AqNP3WRHtqja6kwXJI96hZBpOuTrCHnvP2XsAfJ9grlPvTN_Gxdon58W QJ5G9mUI7EYMNZVnQpVky34pvUmhLM0C5fwIzomei0pwzYxaqVIXJqE3yOQ2o2YgP5t0xnobkqS0 IFH5JPk.cRjP6KgqUkgSyvTSA6S26JbnTLtKMVhtl2hUnFO6DBf3IS0TIcT9RQTF6X9MFNfMV_pu lHXbCm7ZCQg3_Fkp.82W7SGp5fjNsH1DRkPGYvGvLEb6aYazgSsk5Y_Mu5AnZpGbmNMVio4gtOIO w2MiMiogejtghfwWnESPYl0baM7DXmnvpsZY8Es3g_vqQ6fpqUoB8tPJ54cOPiRiAUJC0DGglCIg K1.0X76yDuE2wc7YpCrKAjJopbHz4o2Ro2jUHaeVlmY14uJVgZyJMW57wrDCQko88oBxEEIhsHxY sXwWc7MQQ2DPisIDuY.igciKfEyOJ5NyibhP__YUGDYqOlZlDsMGURPt3PDq7JR3pGRRSab0JkF6 LCxx8Ra.5ZnghR.8r4W5tT1GHmyq5dNlVIHhW3TocuA7EC9JTRRjmKB.GkTDhUeradOFJkjeMlgS pRiLtPu30.KblGdoEfX5FF0DHo3Evbkd256PyP3U5Lygv9Uq70w4lw4F5sS_FVBoxriodrZKcKU3 .dHFEWZ7C6BKcrMU0LHGl1xsxIxonS1_2fsR1neNRW4sBW5RNDnzCzOj05oo80BiaWaZXGIsrZg4 ExnuSdThCk3o_RS92BEF_XCkOF_y4F63tW_PZo6EY_RpYQShIIlDxMG.xojsEnQ1tQSJ7gbXG1yr GHLpxqK3xjBZuxmMOJHOT9QMp.K2iguDOIvJnhYpjE7OEu8aLrfQcaJfEZHL4_19C.HE8eJellj3 8VA27iglamEIayYLuK95R6HvuIPFhQPWLrLyq6RMxaFHPOahs..7wbQ.UKWaPWKk0QukdPbTqKjD d26Nl6P5mskWHk4juypDJ4p2W98s4YoUBvObYfPejzSPpCUqcR8JQjEDqBKClwNHnNK5j7xBjNCG a_QEIaBGmynT5D7VGJ1TglvpkbHJS14G1R1X6kK9CGC4PjVrcMe9VxsusuVmcgXAiYmoRkLwYyzO mu.N82F0wV5K.Q.0XMsYo9hnmEV4Zarx1O_yXoMYffYLU3GNFnLqx.6zWbhgX2FHoPOCgOHqqj9e yL78qci75UOhoYsRvToklaiOG06x8fq.Uce98oq0wxQ4rrFGs8zfeDOflVqtABAU45dd7MlvCY5C iFs2lFQe4UmYA1nVgssAGwcM_fLM5x_hHzPT0SIORLF5wCr_6UtHr29XtR9NdzGD03xA5mY2c4r. iInIKoKjXF4bAnVGoqITwmAHW9wCiqIu995mluXORfvm6q39pzdGZW1P9XDiUea4Ld3Vi5tLfFbw qGDwGMOF2._ZaOqRdV2PJVv76OYVsnkPgQag38ITZXRHYAOyYsPhPL6XNQf0J6m8d1rb_.FSeogv STPHW6R10xEw.A8gFviRLqBDqEZX4vuTZ7mJIj9esKzrU_gqrCg0YI_bacrFQVwQ24qTSrsG.Bjk eDOLiOuozxEX_hT.lgV21VUChyGOOXfYxLeTaWVM4E7iM5auzHWWxxLZPhs46SQyHcJzcnK3Z0cx O7_KCQ2NcxaWFDljYiD7Y1cPxD2x8qbUH2Bk2WrQVsErGZExH_oUVn04WYN9aqNHjdgK1YQPpDL6 Z X-Sonic-MF: X-Sonic-ID: 89c6b3e5-6d9b-4c02-9234-81f5a0716584 Received: from sonic.gate.mail.ne1.yahoo.com by sonic312.consmr.mail.gq1.yahoo.com with HTTP; Mon, 5 Aug 2024 09:09:37 +0000 Received: by hermes--production-gq1-5d95dc458-rvnnh (Yahoo Inc. Hermes SMTP Server) with ESMTPA ID bba17240621277e84a4239334ea13fe0; Mon, 05 Aug 2024 09:09:34 +0000 (UTC) Content-Type: text/plain; charset=us-ascii List-Id: Porting FreeBSD to ARM processors List-Archive: https://lists.freebsd.org/archives/freebsd-arm List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-arm@FreeBSD.org Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3774.600.62\)) Subject: Re: Any known way to build devel/llvm* ( such as devel/llvm19 ) with --threads=1 for its linker activity during the build? From: Mark Millard In-Reply-To: <0b3b532c-ae94-439c-81aa-9e80a08af43f@freebsd.org> Date: Mon, 5 Aug 2024 02:09:24 -0700 Cc: FreeBSD Toolchain , FreeBSD ARM List Content-Transfer-Encoding: quoted-printable Message-Id: References: <4FFD603F-E67C-4B62-B91B-8BE365EAA050@yahoo.com> <82E78798-C376-45C4-80FE-96AD14229419@yahoo.com> <0b3b532c-ae94-439c-81aa-9e80a08af43f@freebsd.org> To: mmel@freebsd.org X-Mailer: Apple Mail (2.3774.600.62) X-Spamd-Bar: ---- X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:36647, ipnet:98.137.64.0/20, country:US] X-Rspamd-Queue-Id: 4WcrH65W6Vz4tRh On Aug 5, 2024, at 00:44, meloun.michal@gmail.com wrote: > On 05.08.2024 9:27, Mark Millard wrote: >> On Aug 5, 2024, at 00:15, Mark Millard wrote: >>> On Aug 4, 2024, at 22:53, Michal Meloun = wrote: >>>=20 >>>> On 04.08.2024 23:31, Mark Millard wrote: >>>>> On Aug 3, 2024, at 23:07, Mark Millard wrote: >>>>>> My recent attempts to build devel/llvm18 and devel/llvm19 in an = armv7 context (native or aarch64-as-armv7) have had /usr/bin/ld failures = that stop the build and report as: >>>>>>=20 >>>>>> LLVM ERROR: out of memory >>>>>> Allocation failed >>>>>>=20 >>>>>> (no system OOM activity or notices, so just a process = size/fragmentation issue, or so I would expect). >>>>>>=20 >>>>>> On native armv7 I also had rust 1.79.0 fail that way so --but = aarch64-as-armv7 built it okay. >>>>>>=20 >>>>>> I'm curious if --threads=3D1 use for the linker might allow the = devel/llvm* builds to complete at this point. Similarly for rust. (top = showed that the ld activity was multi-threaded.) >>>>>>=20 >>>>>> Note: The structure of the poudriere-devel based native build = attempts is historical and it used to work. Similarly for the = aarch64-as-armv7 based build attempts. For now I'd just be exploring = changes that might allow much of my historical overall structure to = still work. But I expect that things are just growing to the point = building is starting to be problematical with process address spaces = that are bounded by a limit somewhat under 4 GiBytes. >>>>>>=20 >>>>>>=20 >>>>>> Native armv7 was a 2 GiByte OrangePi+ 2ed (4 cores) that had >>>>>> at boot time: >>>>>>=20 >>>>>> AVAIL_RAM+SWAP =3D=3D 1958Mi+3685Mi =3D=3D 5643Mi >>>>>>=20 >>>>>> and later had "Max(imum)Obs(erved)" figures: >>>>>>=20 >>>>>> Mem: . . ., >>>>>> 1728Mi MaxObsActive, 275192Ki MaxObsWired, 1952Mi = MaxObs(Act+Wir+Lndry) >>>>>>=20 >>>>>> Swap: 3685Mi Total, . . ., >>>>>> 1535Mi MaxObsUsed, 3177Mi MaxObs(Act+Lndry+SwapUsed), >>>>>> 3398Mi MaxObs(A+Wir+L+SU), 3449Mi (A+W+L+SU+InAct) >>>>>>=20 >>>>>>=20 >>>>>> The aarch64-as-armv7 was a Win DevKit 2023 that has 8 cores and: >>>>>>=20 >>>>>> AVAIL_RAM+SWAP =3D=3D 31311Mi+120831Mi =3D=3D 152142Mi >>>>>>=20 >>>>>> So lots of 4 GiByte or smaller processes would fit. >>>>>>=20 >>>>> Absent finding a way to get --threads=3D1 to be what is used, I >>>>> made the following crude way to test, built it, installed it >>>>> in the armv7 directory tree used for aarch64-as-armv7, and >>>>> then started an aarch64-as-armv7 test of building devel/llvm19 >>>>> to see what the consequences are (leading whitespace details >>>>> might not be preserved): >>>>> # git -C /usr/main-src/ diff contrib/llvm-project/ >>>>> diff --git a/contrib/llvm-project/lld/ELF/Driver.cpp = b/contrib/llvm-project/lld/ELF/Driver.cpp >>>>> index 8b2c32b15348..299daf7dd6fa 100644 >>>>> --- a/contrib/llvm-project/lld/ELF/Driver.cpp >>>>> +++ b/contrib/llvm-project/lld/ELF/Driver.cpp >>>>> @@ -1587,6 +1587,9 @@ static void readConfigs(opt::InputArgList = &args) { >>>>> arg->getValue() + "'"); >>>>> parallel::strategy =3D hardware_concurrency(threads); >>>>> config->thinLTOJobs =3D v; >>>>> + } else if (sizeof(void*) <=3D 4) { >>>>> + log("set maximum concurrency to 1, specify --threads=3D to = change"); >>>>> + parallel::strategy =3D hardware_concurrency(1); >>>>> } else if (parallel::strategy.compute_thread_count() > 16) { >>>>> log("set maximum concurrency to 16, specify --threads=3D to = change"); >>>>> parallel::strategy =3D hardware_concurrency(16); >>>>> Basically, if the process address space has to be "small", avoid >>>>> any default memory use tradeoffs that multi-threading the linker >>>>> might involve --even if that means taking more time. >>>>> We will see if: >>>>> [00:00:33] [07] [00:00:00] Building devel/llvm19@default | = llvm19-19.1.0.r1 >>>>> still fails to build as armv7 vs. if the change leads it to >>>>> manage to build as armv7. >>>>> =3D=3D=3D >>>>> Mark Millard >>>>> marklmi at yahoo.com >>>>=20 >>>> I can build llvm18 and rust 1.79 on native armv7 without problems = - on Tegra TK1, without poudriere and on the ufs filesystem. IMHO = poudriere is unusable on 32bit systems. >>>=20 >>> On Windows DevKit 2023 in a armv7 chroot I can build rust 1.79.0 >>> as well. I've not tried a recent devel/llvm18 in that context, >>> just devel/llvm19 . An armv7 process in this context can use >>> about 1 GiByte more memory space than on the OrangePi+ 2ed. (See >>> later program example outputs.) >>>=20 >>> Previously, devel/llvm18-18.1.7 had built fine some time back. >>> So I'm trying the modern 18.1.8_1 now on the Windows DevKit 2023. >>> But this is with forcing of --threads=3D1 for lld: same context as >>> the recent devel/llvm19 exploration. >>>=20 >>> Note: UFS context, not ZFS. >>>=20 >>> How does the Tegra TK1 context compare for the following >>> program and the example command? >>>=20 >>> OrangePi+ 2ed (so: armv7 native with 2 GiBytes of RAM): >>>=20 >>> # more process_size.c >>> // cc -std=3Dc11 process_size.c >>> // ./a.out 268435456 268435456 268435456 268435456 268435456 = 268435456 268435456 268435456 268435456 268435456 268435456 268435456 = 268435456 134217728 67108864 33554432 16777216 8388608 4194304 2097152 = 1048576 >>>=20 >>> #include >>> #include >>> #include >>> #include >>> #include >>>=20 >>> int main(int argc, char *argv[]) >>> { >>> size_t totalsize=3D 0u; >>> for (int i =3D 1; i < argc; ++i) { >>> errno =3D 0; >>> size_t size =3D strtoul(argv[i],NULL,0); >>> void *p =3D malloc(size); >>> if (p) totalsize +=3D size; >>> printf("malloc(%zu) =3D %p [errno =3D %d]\n", size, p, errno); >>> } >>> printf("approx. total, a lower bound: %zu MiBytes\n", = totalsize/1024u/1024u); >>> return 0; >>> } >>> # cc -std=3Dc11 process_size.c >>> # ./a.out 268435456 268435456 268435456 268435456 268435456 = 268435456 268435456 268435456 268435456 268435456 268435456 268435456 = 268435456 134217728 67108864 33554432 16777216 8388608 4194304 2097152 = 1048576 >>> malloc(268435456) =3D 0x20800180 [errno =3D 0] >>> malloc(268435456) =3D 0x30801980 [errno =3D 0] >>> malloc(268435456) =3D 0x40802640 [errno =3D 0] >>> malloc(268435456) =3D 0x50803600 [errno =3D 0] >>> malloc(268435456) =3D 0x608048c0 [errno =3D 0] >>> malloc(268435456) =3D 0x70805140 [errno =3D 0] >>> malloc(268435456) =3D 0x80806580 [errno =3D 0] >>> malloc(268435456) =3D 0x90807780 [errno =3D 0] >>> malloc(268435456) =3D 0xa0808700 [errno =3D 0] >>> malloc(268435456) =3D 0x0 [errno =3D 12] >>> malloc(268435456) =3D 0x0 [errno =3D 12] >>> malloc(268435456) =3D 0x0 [errno =3D 12] >>> malloc(268435456) =3D 0x0 [errno =3D 12] >>> malloc(134217728) =3D 0xb0809a00 [errno =3D 0] >>> malloc(67108864) =3D 0x0 [errno =3D 12] >>> malloc(33554432) =3D 0xb880a5c0 [errno =3D 0] >>> malloc(16777216) =3D 0xba80b0c0 [errno =3D 0] >>> malloc(8388608) =3D 0x0 [errno =3D 12] >>> malloc(4194304) =3D 0x0 [errno =3D 12] >>> malloc(2097152) =3D 0xbb80c180 [errno =3D 0] >>> malloc(1048576) =3D 0xbba0de80 [errno =3D 0] >>> approx. total, a lower bound: 2483 MiBytes >>>=20 >>>=20 >>> Same program with same command on Windows DevKit 2023 in >>> armv7 chroot (aarch64-as-armv7 with 32 GiBytes of RAM): >>>=20 >>> # ./a.out 268435456 268435456 268435456 268435456 268435456 = 268435456 268435456 268435456 268435456 268435456 268435456 268435456 = 268435456 134217728 67108864 33554432 16777216 8388608 4194304 2097152 = 1048576 >>> malloc(268435456) =3D 0x20800b00 [errno =3D 0] >>> malloc(268435456) =3D 0x30801600 [errno =3D 0] >>> malloc(268435456) =3D 0x40802cc0 [errno =3D 0] >>> malloc(268435456) =3D 0x50803c80 [errno =3D 0] >>> malloc(268435456) =3D 0x608042c0 [errno =3D 0] >>> malloc(268435456) =3D 0x70805b00 [errno =3D 0] >>> malloc(268435456) =3D 0x808063c0 [errno =3D 0] >>> malloc(268435456) =3D 0x90807580 [errno =3D 0] >>> malloc(268435456) =3D 0xa0808b40 [errno =3D 0] >>> malloc(268435456) =3D 0xb0809980 [errno =3D 0] >>> malloc(268435456) =3D 0xc080abc0 [errno =3D 0] >>> malloc(268435456) =3D 0xd080ba00 [errno =3D 0] >>> malloc(268435456) =3D 0xe080cc80 [errno =3D 0] >>> malloc(134217728) =3D 0xf080d700 [errno =3D 0] >>> malloc(67108864) =3D 0x0 [errno =3D 12] >>> malloc(33554432) =3D 0xf880eb40 [errno =3D 0] >>> malloc(16777216) =3D 0xfa80fc00 [errno =3D 0] >>> malloc(8388608) =3D 0x0 [errno =3D 12] >>> malloc(4194304) =3D 0xfb810840 [errno =3D 0] >>> malloc(2097152) =3D 0xfbc117c0 [errno =3D 0] >>> malloc(1048576) =3D 0xfbe12940 [errno =3D 0] >>> approx. total, a lower bound: 3511 MiBytes >>>=20 >>>=20 >>> Note: If the Tegra TK1 in question has more than >>> 4 GiBytes of RAM, the command line should explore >>> more than the example that I used. >>>=20 >>>=20 >>> Note: I've used the program for other patterns of >>> allocations. That is why it is not just a fixed >>> exploration algorithm. >>>=20 >>>=20 >>> As for poudriere-devel, I find it useful, even on >>> the OrangePi+ 2ed. But mostly that is a rare run >>> that is checking on how well the handling goes for >>> the 2 GiByte of RAM context (with notable SWAP for >>> the size of RAM). In other words, monitoring the >>> growth in a context that will break sooner than >>> my other contexts generally would. The tests take >>> days overall, most of the time being for rust and >>> a llvm* . >>>=20 >>> Historically I've been able to have 2 builders, >>> each with MAKE_JOBS_NUMBER_LIMIT=3D2 , so all 4 >>> cores in use building lang/rust and a devel/llvm* >>> at the same time successfully in poudriere-devel >>> on the 2 GiByte OrangePi+ 2ed. (This was before >>> recently imposing --threads=3D1 experiments, >>> given the recent build failures.) >> I should have noted that my normal devel/llvm* builds >> on aarch64 and armv7 avoid building: BE_AMDGPU and >> MLIR . They also target BE_NATIVE instead of >> BE_STANDARD . (aarch64 BE_NATIVE includes armv7 as >> well.) >> =3D=3D=3D >> Mark Millard >> marklmi at yahoo.com > Tegra has 4 Cortex-A15 cores and 2 GB of RAM. OrangePi+ 2ed: Cortex-A7 with 4 cores and 2 GiBytes of RAM. I wonder if the 2483 MiBytes would end up being about the same on the Tegra variation indicated. > All ports are built with default options. The only non-standard item = is the swap size -> I have 16GB of swap on a swap partition on the SSD. Wow, 16 GiBYtes of swap space for 2 GiBytes of RAM. I guess when the swap is added that you get a notice-pair of the structure: QUOTE warning: total configured swap (. . . pages) exceeds maximum recommended = amount (. . . pages). warning: increase kern.maxswzone or reduce amount of swap. END QUOTE with a rather large difference between the two ". . ." figures. Do you make other adjustments to deal with the otherwise-reported potential mistuning? It appears to make tradeoffs in the kernel internal memory handling, if I understand right. > But I guess that's not important in this case. At least for my context, it appears that memory allocations are failing to find a big enough free area inside the process's address space --without running out of system RAM+SWAP space overall. For the OrangePi+ 2ed ( and devel/llvm18 18.1.7 ) it was during the earlier linker run for: FAILED: bin/lli-child-target=20 . . . LLVM ERROR: out of memory Allocation failed That much finished just fine on the Windows DevKit 2023 used via a armv7 jail ( devel/llvm18 18.1.8_1 ). The failure point was in a later link ( matching what I saw via devel/llvm19 ). > I just started build of llvm19 - but it takes few hours to complete.. Probably fewer hours than on the OrangePi+ 2ed but more than on the Windows DevKit 2023 (if they were completing, anyway). =3D=3D=3D Mark Millard marklmi at yahoo.com