Re: llvm10 build failure on Rpi3
- Reply: bob prohaska : "Re: llvm10 build failure on Rpi3"
- In reply to: bob prohaska : "Re: llvm10 build failure on Rpi3"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Thu, 24 Jun 2021 06:02:02 UTC
On 2021-Jun-23, at 21:30, bob prohaska <fbsd T www.zefox.net> wrote: > On Wed, Jun 23, 2021 at 04:22:35PM -0700, Mark Millard wrote: >> On 2021-Jun-23, at 15:28, bob prohaska <fbsd at www.zefox.net> wrote: >> . . . > >> > [snipped for brevity] >> >>>> For example, 0xA5u byte values might be the value that newly >>>> allocated memory is initialized to. Looking . . . man jemalloc >>>> (the memory allocator implementation used by FreeBSD) reports: >>>> >>>> opt.junk (const char *) r- [--enable-fill] >>>> Junk filling. If set to ???alloc???, each byte of uninitialized >>>> allocated memory will be initialized to 0xa5. If set to ???free???, all >>>> deallocated memory will be initialized to 0x5a. If set to ???true???, >>>> both allocated and deallocated memory will be initialized, and if >>>> set to ???false???, junk filling be disabled entirely. This is intended >>>> for debugging and will impact performance negatively. This option >>>> is ???false??? by default unless --enable-debug is specified during >>>> configuration, in which case it is ???true??? by default. >>>> >>>> So, if you have junk filling enabled, I expect that you ran >>>> into a legitimate defect in the llvm-tblgen in use. Having >>>> Junk Filling disabled might be a workaround. >>>> >>>> There is /etc/malloc.conf as a way of controlling the behavior: >>>> >>>> ln -s 'junk:false' /usr/local/poudriere/poudriere-system/etc/malloc.conf >>>> >>>> I suggest you retry building after getting the above in place. >>>> If it does not get the 0xA5A5A5A5u value, that would be >>>> more evidence of a uninitialized-memory defect in the llvm-tblgen >>>> involved. >>>> >>> Done and running now. In the interim I tried building llvm10 using >>> make in /usr/ports, but it failed with another python conflict. >> > The poudriere session just ended, with a somewhat different error: > > In file included from /wrkdirs/usr/ports/devel/llvm10/work/llvm-10.0.1.src/lib/Target/AArch64/AArch64InstructionSelector > .cpp:312: > lib/Target/AArch64/AArch64GenGlobalISel.inc:1900:41: error: expected expression > /*GIM_CheckRegBankForClass: @0*/, /*MI*/1, /*Op*/2, /*RC*//*AArch64::FPR64RegClassID: @0*/, > ^ > lib/Target/AArch64/AArch64GenGlobalISel.inc:1900:99: error: expected expression > /*GIM_CheckRegBankForClass: @0*/, /*MI*/1, /*Op*/2, /*RC*//*AArch64::FPR64RegClassID: @0*/, > ^ > 2 errors generated. > [ 25% 1396/5364] > > The last line is included as a fiducial indicator. Two errors instead of > four, nothing about AMDGPU. You have a prior run that also showed only 2 errors: http://www.zefox.org/~bob/poudriere/data/logs/bulk/main-default/2021-06-21_12h55m51s/logs/errors/llvm10-10.0.1_5.log has: lib/Target/AMDGPU/AMDGPUGenGlobalISel.inc:15822:50: error: expected expression /*GIM_CheckRegBankForClass: @2779096485*/, /*MI*/0, /*Op*/0, /*RC*//*AMDGPU::VGPR_32RegClassID: @2779096485*/, ^ lib/Target/AMDGPU/AMDGPUGenGlobalISel.inc:15822:118: error: expected expression /*GIM_CheckRegBankForClass: @2779096485*/, /*MI*/0, /*Op*/0, /*RC*//*AMDGPU::VGPR_32RegClassID: @2779096485*/, ^ 2 errors generated. And a prior one that shows 6 errors but for AArch64 instead of AMDGPU: http://www.zefox.org/~bob/poudriere/data/logs/bulk/main-default/2021-06-18_19h00m47s/logs/errors/llvm10-10.0.1_5.log has: lib/Target/AArch64/AArch64GenGlobalISel.inc:3760:50: error: expected expression /*GIM_CheckRegBankForClass: @2779096485*/, /*MI*/1, /*Op*/1, /*RC*//*AArch64::FPR64RegClassID: @2779096485*/, ^ lib/Target/AArch64/AArch64GenGlobalISel.inc:3760:117: error: expected expression /*GIM_CheckRegBankForClass: @2779096485*/, /*MI*/1, /*Op*/1, /*RC*//*AArch64::FPR64RegClassID: @2779096485*/, ^ lib/Target/AArch64/AArch64GenGlobalISel.inc:5735:50: error: expected expression /*GIM_CheckRegBankForClass: @2779096485*/, /*MI*/0, /*Op*/1, /*RC*//*AArch64::GPR64RegClassID: @2779096485*/, ^ lib/Target/AArch64/AArch64GenGlobalISel.inc:5735:117: error: expected expression /*GIM_CheckRegBankForClass: @2779096485*/, /*MI*/0, /*Op*/1, /*RC*//*AArch64::GPR64RegClassID: @2779096485*/, ^ lib/Target/AArch64/AArch64GenGlobalISel.inc:22981:50: error: expected expression /*GIM_CheckRegBankForClass: @2779096485*/, /*MI*/0, /*Op*/1, /*RC*//*AArch64::GPR64spRegClassID: @2779096485*/, ^ lib/Target/AArch64/AArch64GenGlobalISel.inc:22981:119: error: expected expression /*GIM_CheckRegBankForClass: @2779096485*/, /*MI*/0, /*Op*/1, /*RC*//*AArch64::GPR64spRegClassID: @2779096485*/, ^ 6 errors generated. ninja: build stopped: subcommand failed. *** Error code 1 It appears that the bug does not have reproducible details but all of the examples that do not have junk:false show @2779096485 . (And the only junk:false tried so far has @0 instead.) Something is providing and/or using initialized memory. There is the possibility that swapping out and back in is sometimes not provides pages with the intended content. I state that as an example that we really can not claim to know that llvm-tblgen itself is doing something wrong. I'm not claiming to know what is actually happening. But such would fit with contexts that have more RAM that end up avoiding much of the paging/swapping also not seeing the problem. But as in some past examples, you may have exposed a problem with FreeBSD. >> Intersting. I'm unable to see a: >> >> /usr/local/poudriere/poudriere-system/etc/malloc.conf >> >> via what you have published. But I've no clue if such >> an odd symbolic link would be expected to show up. Still true, but . . . Well, now: http://www.zefox.org/~bob/poudriere/ shows a: junk:false Note that this is at the same level as poudriere-system/ is shown. You might want to look and see if the file system shows such a file at that level as well. This did not show up until after the build attempt had finished from what I can tell. > The link seems visible to find and ls: > root@www:/usr/local/poudriere # find . -name malloc.conf > ./poudriere-system/etc/malloc.conf > root@www:/usr/local/poudriere # more ./poudriere-system/etc/malloc.conf > ./poudriere-system/etc/malloc.conf: No such file or directory > root@www:/usr/local/poudriere # ls -l ./poudriere-system/etc/malloc.conf > lrwxr-xr-x 1 root wheel 10 Jun 23 14:27 ./poudriere-system/etc/malloc.conf -> junk:false > root@www:/usr/local/poudriere # > > The link seems invisible to cat and more, reporting "No such file...." The link is looking for a file called junk:false in the same directory. It is not expected to find such a file. > I'm not sure what might be profitably tried next..... Suggestions welcome! First off, if the point is to get the RPi3B+ going more than it is to get evidence about the problem, I'd suggest booting an RPi4B with the same media (adjusting config.txt as necessary) and trying the build from that boot. If it builds, the media can be moved back to the RPi3B+ for other activity. The failed vs. built status does give some information about the problem. Built would suggest that paging/swapping was involved in the problem. Failed might suggest otherwise. (I do not know if there would be much paging/sapping, depending on how much RAM the RPi4B had.) One experiment would be to use the same boot media on an RPi4B but that had been told in config.txt to limit itself to 1 GiByte of RAM --and to also try with all the RAM being allowed. If the first fails but the second works, that is probably nice evidence. If both fail, that also is probably nice evidence. The other two combinations are less clear what any implications would be. (I'm not claiming that you have such a RPi4B that can be made available for the duration of such experiments.) Another direction is messy: testing under stable/13 and/or releng/13.0 vintages to see if it is somehow specific to main [so: 14], having an analogous context to what is known to fail under main (as much as reasonable). The RPi4B two-RAM-sizes comparison/contrast type of test could also be used. There is also just repeating with junk:false a couple of times to see if there is evidence of variability like there is for without junk:false. Simplest of the suggested tests, but likely the least informative. None of this would be likely to get close to a short, small test that shows the problem. I've no clue how to target that at this point. === Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar)