From nobody Thu Jun 24 17:41:38 2021 X-Original-To: freebsd-arm@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id E789511DC1A1 for ; Thu, 24 Jun 2021 17:41:46 +0000 (UTC) (envelope-from marklmi@yahoo.com) Received: from sonic310-21.consmr.mail.gq1.yahoo.com (sonic310-21.consmr.mail.gq1.yahoo.com [98.137.69.147]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4G9nWG3wxjz4vSq for ; Thu, 24 Jun 2021 17:41:46 +0000 (UTC) (envelope-from marklmi@yahoo.com) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1624556504; bh=zRl1/lmJEqLQ1JKkVGKnIr1enDFyaWs4XMgs2UPB1Fc=; h=Subject:From:In-Reply-To:Date:Cc:References:To:From:Subject:Reply-To; b=DcKFSXP0YVLLE3ErraASefyiJAVMKTk2J25T1tVq36G1a/Q47e9VKnPkQ8pvi9fdrCXmZrSKiwXgAASSKx3GdurblC+KYhpyMyzbPO44iPlibGUju/1N9G9YQZtY2liqhplA7aulL0kT85joZo4RvKxMn5HNE8efCROWbx1paXSk8+Fz8vzjjiT0eup1WQzDxXGVqi5fxHRJ+XTm1Cqn9APmulmJeN9KgBMLmMHTMC3Kr8P+NSPYqGtord/uFyrAkpOkd7pHsQG6ZO7IrqoQW0c5iygSuMYFWFuzrLgjoaqEefFM6DNSWl6UYCFFT9F5JUFE1szK3E+9fzBjqR3+cA== X-SONIC-DKIM-SIGN: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1624556504; bh=i+AbPbAa3k43J1etJJf7QdiWGS7PP7Ckui0QVEOn90A=; h=X-Sonic-MF:Subject:From:Date:To:From:Subject; b=mB8o794QjF6Etfka4NsNa7ksBd5a4stWOM7JNaHzPpsq8jN+CCYvlIMxQXTUb7QiesbIqld2Lv59e7ktm2A1afBIxEIczqldkDdO8XXE5baZAlz1787MBAIVa/LEwjnv8xM0rqowGpXOXU+YUHI6BhcO6rw4+CyZm6T3T4ZSSOhQ8Ms0++J+++3+lE1rFeQIWoIzjPrK050V/6zqiq+5PlozYUyT2Z/m0BAs4QX85av+w8hi2GaRJVmyPV2AhXDkckotyZBpUTBRuqQOSnNzLMJ7udWgmQ6oJhu3LBvMo8Wm5uophNG7Z6Sp7lfJrBrHwnaaursLkTPfzNTHl08OGQ== X-YMail-OSG: gl9d1NYVM1kU0jZLRLu7.LQILrimk8DP8C4AhrPNaeA_OXq13RK9tiFVWQSYJlP G88bIwdYdVrD6sp11ml5rNzl6AWcVdhfZ9hIrtzaRD4uMi67yMU5TA7bHwMwq4tb9Itl3Povmxwh M2UQid0T.Wk796m1CTEs8igZnGaWZtan..vHffDf.TUEMACGojdMf.cGis1Hypo4IeAzCe0ezZmD iJ6mGgXoJMwMErAR1RWeufg5Pt9bsk1EnaPYXPMg.G0evqBkUEguIxO_FuQ6r1lzET0uxsYUgne1 7jrKMQPmtWl0QIVsFJfLr92rX7LOUyAs1f5.uk_20TF.OGwGQfis_ta6cpna3M6Jyyx.JgXuWpAO uemsqW19YHkyB7mNcRVR7qhPbgvUaJFYwMe9GLLSpdCj.ay2NcLBtzKmGvZ3zbGwmucMFmFr2DH0 Grje_38j5JWhIEDdgXcG7NGJxyg8jZyIzVTrCnojlghnZSdeVWbASjZva69z137j0PnTAZj2xvW2 iVnrF9afMFhgE_chtIgGq6I9tvm0GK91OTjSJ9In6XIijcHOg_3U6x36tQd50JDO6vJmOquATeSB k92zseYd1scwerRzgDmTKQvJM71s9StV47yxAnQ694UtPvbITPXEg_uPSJQoG7E8S4BVOpBaCbQp 05XcimpCzXn.blL3SvKJYl6WOwzibXPGY6yde7y1u42YM341E5LLwGH84BgcNZCsz9kfDJHIe_.b _miR57nflN_BTbpH7YA_zcb_KBA5KYUVHRjQsyWnD7uiy2CKF1CTkjDikCFLRNyYeqhCo0tyiv3v Vwvtt5x._ZyG2pBQ0v4jtjDF57RkKXvoMI6dT3b1sZjdog.6cvfFXfjaxaGieiLrPgZq_OW9yri8 rsXN3JoQTeGsFeC_n_45R4ZnpFoxrtePPLJZf1F1rF08S.BOvG02GIs8ML30czhl7YNCpPYGtdTS .qFEMjR5v6hOM4emJIzDRCgwHyBdQ0v.6X_k5E6H3n.3aPeKI31DATEBA1iQnQ.D.2.dc_YdP37S mx.iksgr_zYivrLaGtST70p4aOVvz7wCLDvq0nMXUQif34BbuviXT12tOrMD9ZtsL9Ud..md.I_X 6rJOb2zvnABxNfuOCRiQC8UPQFDpVQHF.DqzJXjo1l7DyEF4GV2NqKQgCVfXzTG..pMcFpTf_PC9 3817JPNn5gYInAI22bzpCBVLus_9_M9ezK1_dPKFbfeSOiGW7OwonCnrFoocW8_2RCoLRBD4gW3a z1hfFWdxcaoEHbYQAwLo5ZIKAS2gjkuAqciW9TW3nu_8xVebzvR4_PnCMIHLQbRHNB4lAoyQix38 wqnxPqgG4FAcjS4SM9kN60g.XcOd7rNGKkwKIhn.xk364Ss0F_8q8TPjl8M_4MOqRcHR3YBfe88c VJeeqQZwnr574uIdhvNpZgQuP3DwebTRuSL3cmt4l3Ia.SHd_eEyukDEx7wT8k9Jnlr.4yzCdbRd CfzOCDCDny92tWC8acZaLUA4iHnrM9QRQ07tEPqoDbUUXrsvNg8D6FiVVffowse45BPvDlX.dflG AG_G3A13H741MZG3eRM82ay7Zs1tnKh.0AiEbOg3ONIqffmTAPZZ4s59lDPDXHYt5hBMywI3D_Q9 3b9BjZpTZh3g.dWJrdIh5NANXhvNXmhaQKrrw0AFmL_PLenxFgYP4vlk3otrOlhHPVljP7b5aRrF 3Qku1gQTlTCPIa39HCYSmSxS0Q5bsGEMa0xyxaI7TaVr7rpecEpcNzlxshAkHPTMzZPgKeVDbnFa lj1A17EWZLNMGR.SzIfytlzHUT4gZDq5TH8wju1yVrHhsVBLU3PWb447TccDlmeOlEk4Qw0Fr26V VVQTY9Wgt7Hvvzn0ZVp4NNsYNp1n4bVAkcKBsX9KhDKJRnfAGjEamHoGfaPc7wEqU0386hZm2qZ9 eCt63lV_tv9gJxJ5Vi225Etw9xcP2wnc.Bq2m6lrVzHxT9NWownCIO.cgZVlBqsiaoxNwD8Tq875 6E_x5kpOYKBLoEW9QAavE.EAOMcmPKF.AZhBBZQuhmQ6eGQABXJ.2y1FOjRazer4QF5gjfzCD44i epI2SxyhuzbTxg.kx3.Jwc7j0IeB8bxRIgJzkAn4ma34gaash7_LAU34OpJFmfYF.jxj64QxZcX4 XshHf3b5hbuL.NI4eGnqSwF4NE.ykOznFW50ZS1tmPKWDlE.vrVZYGxhvTld_hlVayYCJ5u59OIZ v_BGA3z2r3OsAwRo_p4psfxuYQyx.2cIYYFj54CnMI0c_1LZ7m3wrNZhpY6IIP4oAwxgZ25tEn31 Y00YlJ8PAeeRGwrvq_ea7e6.8SpowwJTQ77U1LJyls4N8fWZGUZnDensw_GWqemU9q52UI5SIUy5 8b6k2ORYiKRL1RwLNWHXqq1x.3oosv62ajwaWVvv.Noj5PArkJboZDhext6C9D4WozNx8vkzf3p5 HF5fjIOe0SbnVck4x0_RIZ_ZppLWIdAIKKaqxFsw3NhDuBpgHig9lhd.cbK6CbSEW78tqXD21I9a QKUxyZd8hJmWaxtC4u6PVVrmG_qvLIg4Q6z62MnTvEGpFPREJ3EFcbK9ahu9VgMS8FkYcf58QNIm 8lc4maUyQudBfWSgJrW3u0Vqn8Npafo8ZQmGzNSg5IwzmV4Tl4lbnha8xubM8Nwsjc2sSbU9mX5c aaxiuFbBVjXUgQ8aj2FNPsXIvXVGMfFJsQajj.yrSBgGa4M8CpDba3yD32ZNTszdTusY- X-Sonic-MF: Received: from sonic.gate.mail.ne1.yahoo.com by sonic310.consmr.mail.gq1.yahoo.com with HTTP; Thu, 24 Jun 2021 17:41:44 +0000 Received: by kubenode573.mail-prod1.omega.bf1.yahoo.com (VZM Hermes SMTP Server) with ESMTPA ID a190627fc72cd8f070d3662f15a06f70; Thu, 24 Jun 2021 17:41:40 +0000 (UTC) Content-Type: text/plain; charset=us-ascii List-Id: Porting FreeBSD to ARM processors List-Archive: https://lists.freebsd.org/archives/freebsd-arm List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-arm@freebsd.org Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.100.0.2.22\)) Subject: Re: llvm10 build failure on Rpi3 In-Reply-To: <20210624160109.GB87740@www.zefox.net> Date: Thu, 24 Jun 2021 10:41:38 -0700 Cc: FreeBSD ports , freebsd-arm , FreeBSD Toolchain Content-Transfer-Encoding: quoted-printable Message-Id: <3B41633E-AAFC-422A-8D73-3B1B001023F0@yahoo.com> References: <20210623050958.GA79888@www.zefox.net> <20210623174338.GA84853@www.zefox.net> <6F0CF2F3-A298-4CEA-AA07-B79810F3E8CF@yahoo.com> <20210623222838.GA85566@www.zefox.net> <8E78EE69-44A2-429E-AB65-941537DE25A0@yahoo.com> <20210624043000.GA87740@www.zefox.net> <22B941CA-3AFF-42FD-98D1-D40EC2F6EC43@yahoo.com> <20210624160109.GB87740@www.zefox.net> To: bob prohaska X-Mailer: Apple Mail (2.3654.100.0.2.22) X-Rspamd-Queue-Id: 4G9nWG3wxjz4vSq X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[] Reply-To: marklmi@yahoo.com From: Mark Millard via freebsd-ports X-Original-From: Mark Millard X-ThisMailContainsUnwantedMimeParts: N On 2021-Jun-24, at 09:01, bob prohaska wrote: > [What about trying a new kernel? details at end] > On Wed, Jun 23, 2021 at 11:02:02PM -0700, Mark Millard wrote: >> On 2021-Jun-23, at 21:30, bob prohaska wrote: >>=20 >>> On Wed, Jun 23, 2021 at 04:22:35PM -0700, Mark Millard wrote: >>>> On 2021-Jun-23, at 15:28, bob prohaska = wrote: >>>> . . . >>>=20 >>>>=20 >>> [snipped for brevity] >>>>=20 >>>>>> For example, 0xA5u byte values might be the value that newly >>>>>> allocated memory is initialized to. Looking . . . man jemalloc >>>>>> (the memory allocator implementation used by FreeBSD) reports: >>>>>>=20 >>>>>> opt.junk (const char *) r- [--enable-fill] >>>>>> Junk filling. If set to ???alloc???, each byte of = uninitialized >>>>>> allocated memory will be initialized to 0xa5. If set to = ???free???, all >>>>>> deallocated memory will be initialized to 0x5a. If set to = ???true???, >>>>>> both allocated and deallocated memory will be = initialized, and if >>>>>> set to ???false???, junk filling be disabled entirely. = This is intended >>>>>> for debugging and will impact performance negatively. = This option >>>>>> is ???false??? by default unless --enable-debug is = specified during >>>>>> configuration, in which case it is ???true??? by default. >>>>>>=20 >>>>>> So, if you have junk filling enabled, I expect that you ran >>>>>> into a legitimate defect in the llvm-tblgen in use. Having >>>>>> Junk Filling disabled might be a workaround. >>>>>>=20 >>>>>> There is /etc/malloc.conf as a way of controlling the behavior: >>>>>>=20 >>>>>> ln -s 'junk:false' = /usr/local/poudriere/poudriere-system/etc/malloc.conf >>>>>>=20 >>>>>> I suggest you retry building after getting the above in place. >>>>>> If it does not get the 0xA5A5A5A5u value, that would be >>>>>> more evidence of a uninitialized-memory defect in the llvm-tblgen >>>>>> involved. >>>>>>=20 >>>>> Done and running now. In the interim I tried building llvm10 using >>>>> make in /usr/ports, but it failed with another python conflict. >>>>=20 >>> The poudriere session just ended, with a somewhat different error: >>>=20 >>> In file included from = /wrkdirs/usr/ports/devel/llvm10/work/llvm-10.0.1.src/lib/Target/AArch64/AA= rch64InstructionSelector >>> .cpp:312: >>> lib/Target/AArch64/AArch64GenGlobalISel.inc:1900:41: error: expected = expression >>> /*GIM_CheckRegBankForClass: @0*/, /*MI*/1, /*Op*/2, = /*RC*//*AArch64::FPR64RegClassID: @0*/, >>> ^ >>> lib/Target/AArch64/AArch64GenGlobalISel.inc:1900:99: error: expected = expression >>> /*GIM_CheckRegBankForClass: @0*/, /*MI*/1, /*Op*/2, = /*RC*//*AArch64::FPR64RegClassID: @0*/, >>> = ^ >>> 2 errors generated. >>> [ 25% 1396/5364] >>>=20 >>> The last line is included as a fiducial indicator. Two errors = instead of >>> four, nothing about AMDGPU.=20 >>=20 >> You have a prior run that also showed only 2 errors: >>=20 >> = http://www.zefox.org/~bob/poudriere/data/logs/bulk/main-default/2021-06-21= _12h55m51s/logs/errors/llvm10-10.0.1_5.log >>=20 >> has: >>=20 >> lib/Target/AMDGPU/AMDGPUGenGlobalISel.inc:15822:50: error: expected = expression >> /*GIM_CheckRegBankForClass: @2779096485*/, /*MI*/0, /*Op*/0, = /*RC*//*AMDGPU::VGPR_32RegClassID: @2779096485*/, >> ^ >> lib/Target/AMDGPU/AMDGPUGenGlobalISel.inc:15822:118: error: expected = expression >> /*GIM_CheckRegBankForClass: @2779096485*/, /*MI*/0, /*Op*/0, = /*RC*//*AMDGPU::VGPR_32RegClassID: @2779096485*/, >> = ^ >> 2 errors generated. >>=20 >> And a prior one that shows 6 errors but for AArch64 instead of = AMDGPU: >>=20 >> = http://www.zefox.org/~bob/poudriere/data/logs/bulk/main-default/2021-06-18= _19h00m47s/logs/errors/llvm10-10.0.1_5.log >>=20 >> has: >>=20 >> lib/Target/AArch64/AArch64GenGlobalISel.inc:3760:50: error: expected = expression >> /*GIM_CheckRegBankForClass: @2779096485*/, /*MI*/1, /*Op*/1, = /*RC*//*AArch64::FPR64RegClassID: @2779096485*/, >> ^ >> lib/Target/AArch64/AArch64GenGlobalISel.inc:3760:117: error: expected = expression >> /*GIM_CheckRegBankForClass: @2779096485*/, /*MI*/1, /*Op*/1, = /*RC*//*AArch64::FPR64RegClassID: @2779096485*/, >> = ^ >> lib/Target/AArch64/AArch64GenGlobalISel.inc:5735:50: error: expected = expression >> /*GIM_CheckRegBankForClass: @2779096485*/, /*MI*/0, /*Op*/1, = /*RC*//*AArch64::GPR64RegClassID: @2779096485*/, >> ^ >> lib/Target/AArch64/AArch64GenGlobalISel.inc:5735:117: error: expected = expression >> /*GIM_CheckRegBankForClass: @2779096485*/, /*MI*/0, /*Op*/1, = /*RC*//*AArch64::GPR64RegClassID: @2779096485*/, >> = ^ >> lib/Target/AArch64/AArch64GenGlobalISel.inc:22981:50: error: expected = expression >> /*GIM_CheckRegBankForClass: @2779096485*/, /*MI*/0, /*Op*/1, = /*RC*//*AArch64::GPR64spRegClassID: @2779096485*/, >> ^ >> lib/Target/AArch64/AArch64GenGlobalISel.inc:22981:119: error: = expected expression >> /*GIM_CheckRegBankForClass: @2779096485*/, /*MI*/0, /*Op*/1, = /*RC*//*AArch64::GPR64spRegClassID: @2779096485*/, >> = ^ >> 6 errors generated. >> ninja: build stopped: subcommand failed. >> *** Error code 1 >>=20 >> It appears that the bug does not have reproducible details >> but all of the examples that do not have junk:false show >> @2779096485 . (And the only junk:false tried so far has @0 >> instead.) >>=20 >> Something is providing and/or using initialized memory. >>=20 >> There is the possibility that swapping out and back in is >> sometimes not provides pages with the intended content. >> I state that as an example that we really can not claim >> to know that llvm-tblgen itself is doing something wrong. >> I'm not claiming to know what is actually happening. But >> such would fit with contexts that have more RAM that >> end up avoiding much of the paging/swapping also not >> seeing the problem. >>=20 >> But as in some past examples, you may have exposed a >> problem with FreeBSD. >>=20 >>>> Intersting. I'm unable to see a: >>>>=20 >>>> /usr/local/poudriere/poudriere-system/etc/malloc.conf >>>>=20 >>>> via what you have published. But I've no clue if such >>>> an odd symbolic link would be expected to show up. >>=20 >> Still true, but . . . >>=20 >> Well, now: http://www.zefox.org/~bob/poudriere/ >> shows a: junk:false >>=20 >> Note that this is at the same level as poudriere-system/ >> is shown. You might want to look and see if the file >> system shows such a file at that level as well. >>=20 >> This did not show up until after the build attempt had >> finished from what I can tell. >>=20 >>> The link seems visible to find and ls:=20 >>> root@www:/usr/local/poudriere # find . -name malloc.conf >>> ./poudriere-system/etc/malloc.conf >>> root@www:/usr/local/poudriere # more = ./poudriere-system/etc/malloc.conf >>> ./poudriere-system/etc/malloc.conf: No such file or directory >>> root@www:/usr/local/poudriere # ls -l = ./poudriere-system/etc/malloc.conf >>> lrwxr-xr-x 1 root wheel 10 Jun 23 14:27 = ./poudriere-system/etc/malloc.conf -> junk:false >>> root@www:/usr/local/poudriere #=20 >>>=20 >>> The link seems invisible to cat and more, reporting "No such = file...." >>=20 >> The link is looking for a file called junk:false in the same >> directory. It is not expected to find such a file. >>=20 >>> I'm not sure what might be profitably tried next..... Suggestions = welcome! >>=20 >> First off, if the point is to get the RPi3B+ going >> more than it is to get evidence about the problem, >> I'd suggest booting an RPi4B with the same media >> (adjusting config.txt as necessary) and trying the >> build from that boot. If it builds, the media can >> be moved back to the RPi3B+ for other activity. >> The failed vs. built status does give some >> information about the problem. Built would suggest >> that paging/swapping was involved in the problem. >> Failed might suggest otherwise. (I do not know >> if there would be much paging/sapping, depending on >> how much RAM the RPi4B had.) >>=20 >> One experiment would be to use the same boot media on >> an RPi4B but that had been told in config.txt to limit >> itself to 1 GiByte of RAM --and to also try with all >> the RAM being allowed. If the first fails but the >> second works, that is probably nice evidence. If both >> fail, that also is probably nice evidence. The other >> two combinations are less clear what any implications >> would be. >>=20 >> (I'm not claiming that you have such a RPi4B that can >> be made available for the duration of such experiments.) >>=20 >> Another direction is messy: testing under stable/13 and/or >> releng/13.0 vintages to see if it is somehow specific >> to main [so: 14], having an analogous context to what is >> known to fail under main (as much as reasonable). The >> RPi4B two-RAM-sizes comparison/contrast type of test could >> also be used. >>=20 >> There is also just repeating with junk:false a couple of >> times to see if there is evidence of variability like >> there is for without junk:false. Simplest of the >> suggested tests, but likely the least informative. >>=20 >> None of this would be likely to get close to a short, >> small test that shows the problem. I've no clue how >> to target that at this point. >>=20 > How about booting an older kernel so see if that makes a difference? An interesting point that I'd not thought about was that if paging/swapping (or other I/O) was a source of the problem, then, not only world, but also kernel code would have to be tracking the status of /etc/malloc.conf . It is not obvious to me that the kernel would directly track that. But if the kernel was not replacing the content of some pages like it should, it might be that we are just seeing the world code's prior initialization of the memory. > ls -dl /boot/kernel* reports > drwxr-xr-x 2 root wheel 13824 Jun 18 18:15 /boot/kernel > drwxr-xr-x 2 root wheel 13312 Jan 9 15:57 = /boot/kernel.main-c255664-g4d64c7243d26 > drwxr-xr-x 2 root wheel 13312 Aug 29 2020 /boot/kernel.mmccam > drwxr-xr-x 2 root wheel 13824 Jun 9 18:52 /boot/kernel.old > drwxr-xr-x 2 root wheel 13312 Aug 27 2020 /boot/kernel.r364346 > drwxr-xr-x 2 root wheel 13312 Aug 29 2020 /boot/kernel.r364895 > drwxr-xr-x 2 root wheel 13312 Sep 7 2020 /boot/kernel.r365355 >=20 > Most of these are probably too old to work at all, but Jun 9 and Jan 9 > might possibly work, I'd expect kernel.old to work as well. ISTR the > previous success building chromium was early 2021 or before.=20 >=20 I'll note that: QUOTE (from 2021-06-12 01:53:02 +0000 commit) param.h: Bump __FreeBSD_version to 1400022 Commit e1a907a25cfa changed the internal KAPI between the krpc and nfsserver. As such, both modules must be rebuilt from sources. Bump __FreeBSD_version to 1400022. END QUOTE So: Even going back to June 9 may messed up nfs use. (I've no clue what services you depend on or in what contexts.) You might need to disable nfs even trying to start at the next boot before booting into such an older kernel. Jan 9 predates 14 and 13.0-RELEASE: sys/sys/param.h got #define __FreeBSD_version 1400000 back on Jan-22. Running newer worlds on older kernels is not supported. Generally folks to not track the KBI changes vs. the consequences of not having the right KBI. This makes interpreting results difficult even when it appears to work. There can be mixes like NFS not working but other things working. There could be corruptions but such may not be likely. Do you have what you consider sufficient backups it case things get messed up? (That might be the status of being okay with starting over if something really bad happens.) If you try the combination you might want to review the boot messages for any evidence of problems to worry about before starting a poudriere run or otherwise causing the system to be busy (or even, just leaving it running but basically idle). If the world/kernel combination happened to work well for the specific activity, I do think the experiment could be useful. But, if it were me, I'd not want to run that way beyond the experiment(s), even if the specific problem seems to go away. If anything else odd happens with an old kernel in use, interpreting the result usefully will be unlikely. =3D=3D=3D Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar)