Re: llvm10 build failure on Rpi3
- In reply to: bob prohaska : "Re: llvm10 build failure on Rpi3"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Sun, 04 Jul 2021 00:43:51 UTC
On 2021-Jul-3, at 14:54, bob prohaska <fbsd at www.zefox.net> wrote: > On Sat, Jul 03, 2021 at 01:15:19PM -0700, Mark Millard wrote: >> >> >> >> So you still have not tried an artifacts or snapshot kernel+world? >> > Not yet. > >>> Eventually I resorted to running make in devel/llvm10, to my surprise it >>> ran to completion. >> >> Interesting. >> >> Was this -j4? -j1? -j2? Any other interesting characteristics >> for how it was run? >> > Nothing special was done. IIRC, it was make -DBATCH > make.log in > the background. From top's screen it looked like -j4. > >> It would be interesting to see if building in a chroot >> in that make style also worked (or a non-poudriere jail). >> > > Can you point me to instructions for doing the experiment? I'll deal with this is a separate reply. >>> It also ran make package successfully. Again I tried to >>> build just devel/llvm10 using poudriere, again getting "expected expression". >>> >>> At that point I resized the swap partitions to 1 GB each and tried poudriere >>> on devel/llvm10. That got rid of the excessive swap warnings, but didn't help. >>> Finally I placed >>> MAKE_JOBS_NUMBER=2 >>> in /usr/local/etc/poudriere.d/make.conf and tried again. That still failed, >>> still with "expected expression". >> >> I'll note that the running build build shows Load Averages >> of under 3. So the MAKE_JOBS_NUMBER=2 seems to be working. >> >>> Since devel/llvm10 had created a package successfully, I tried slipping a copy >>> into poudriere's package directory, hoping it would find and use the package >>> to make further progress. Unfortunately, poudriere seems to remember the failure >>> and won't use the proffered package. >> > [large snip which convinced me to give up on tricking poudriere into > using a package constructed by make] >> >> Going in a different direction, one way to force a build to >> start over after a failure is to: rm -fr PATH/.building >> before starting a new bulk build. This might be appropriate > I'm missing something here: what does PATH represent? There's > nothing called .building under /usr/local/poudriere, at least > after the run finishes. Part of how this works is that .building/ is initially populated with a shadow copy of the already existing .latest/ mostly via use of hard links, with some top level files actually copied. If the status of the bulk run reaches stopped:done: then the .building/ is mv'd (renamed) to be of the form .real_*/ with a new match for the * and then the links are adjusted to point to the new .real_*/ and the old .real_*/ is removed. In your context, this happens inside: /usr/local/poudriere/data/packages/main-default/ So, yes, your run that reached stopped:done: no longer has a .building/ By contrast, say you ^C the bulk run or that it reaches the stopped:crashed: state instead of stopped:done: . Then the .building/ would still be present, as would the pre-existing existing .real_*/ and the links that use it. This is the context for the next bulk run reporting: "Using packages from previously failed build: ${PACKAGES}/.building" >> if one suspects a problem of a kind that did not stop a >> build but produced something for a build that fails to operate >> correctly. >> > Such as a corrupt llmv-tblgen? Yep, possibly via it depending on something else that has problems. >> So lang/rust finished. That is interesting because it includes an >> llvm build internally. >> > > Does that build invoke the same llvm-tblgen? Every devel/llvm* build builds its own llvm-tblgen . lang/rust would build its own too. And the system llvm support builds its own as well. > [snip] >> Again, poudriere does not control memory initialization in >> the processes in the builders. >> > > For some reason I got the idea that whatever asked for memory to use > was responsible for initializing it. Part of the point of having memory management libraries have way to be told to fill-in things like 0xA5u bytes is to get hints about contexts that end up with memory not explicitly initialized by the requesting program. Such is why I had you try the contrasting junk:false case in /etc/malloc.conf . The results showed what the memory allocation library initialized with instead of something specific to the code requesting the allocation. > Certainly not the kernel..... The kernel fills in bytes into some user-space memory as part of doing various requested operations. In such cases it is potentially possible for the kernel to not have filled-in the memory like it should have. It is also possible for the kernel to replace the bytes seen by user-space memory that it should not touch. There is an example on-going issue with this for the 32-bit powerpc kernels that cover using old PowerMacs. >>> The fact that the stoppage reported looks like >>> a syntax error specific to devel/llmv10 which is unaffected by swap pressure >>> makes it seem unrelated to kernel or swap constraints. >> >> The files with the syntax errors are ones generated by llvm-tblgen >> during the build and it is the output of llvm-tblgen that is corrupt, >> showing evidence of having used memory not initialized like it should >> have been. >> > > Wouldn't that point suspicion at llvm-tblgen, of whatever version > LLVM is actually doing the work? It points at llvm-tblgen and/or something(s) that llvm-tblgen depends on. Either way, the observed failure is from the llvm-tblgen output being incorrect and later complained about. devel/llvm10 builds its own llvm-tblgen for its own use. Each devel/llvm* does. (As does the system's llvm*.) There is also the variability in which llvm-tblgen output is messed up: it is always some example of: lib/Target/*/*GenGlobalISel.inc but which value for the *'s tends to vary from build attempt to build attempt. It suggests that some sort of race condition is involved. >>> AIUI, the hardware of the Pi4 is considerably different from the Pi3 in terms >>> of memory management, noted from an interview with Eben Upton on YouTube. >> >> Why would Eben Upton be talking about FreeBSD's memory management? >> > He was talking about the Pi4 hardware and how it differed from the Pi3 Which is not memory management as such. >> I suspect that the talk is not about what you think it is about, >> but some narrower aspects than the overall memory managment. >> > > I thought it had something to do with added DMA capablity. The video is at > https://www.youtube.com/watch?v=hyj-7mTnumI > In light of the discussion about llvm-tblgen I'm doubtful it's relevant, > but it's not the worst way to waste an hour. > >> >>> Is there any sort of sanity test for the poudriere system? If I delete and >>> re-create the existing jail can the existing package library be preserved >>> and re-used? If not, that's OK, I'd just like to know beforehand. >>> >> >> # poudriere jail -jNAME -d >> # poudriere jail -c -jNAME -m null -M /WORLDPATH -S /SRCPATH -v 14.0-CURRENT >> >> should work fine. But really all that you are >> doing is (using an example from my environment) >> is deleting and rewriting a few very small files >> in a directory with the jail's name: >> > So, in my case /usr/local/poudriere/poudriere-system? After the delete would be: poudriere jail -c -jNAME -m null -M /usr/local/poudriere/poudriere-system -S /usr/src -v 14.0-CURRENT Same as in your: http://www.zefox.org/~bob/readme > (using the nomenclature in your sample instructions). > That would leave /usr/local/poudriere/data intact.... Yep. The delete does have an option (-C ???) for causing more to be deleted under /usr/local/poudriere/data/ . (Despite documentation claims otherwise, it did not seem to delete packages when reqeuested.) > I'm starting to understand why you think it unlikely > to help. > >> The deletion/replacement of timestamp may have rebuild >> consequences from appearing to have changed (or just >> being missing). >> > If timestamps guide decisions on what to make and when, > that might be significant. Not sure how I might've screwed > them up, but in my hands anything is possible 8-) I took a quick look and did not notice any timestamp comparisons controlling anything. >> Nothing about any of those is going to change how memory >> initialization is working in llvm-tblgen's operation >> for generating any *GenGlobalISel.inc files, other than >> if the timestamp forces some sort of rebuild from scratch >> of some build dependencies first. >> > Maybe this should be obvious, but which llvm-tblgen is in > action? the one from the system, (12.0.1) or something > else? > devel/llvm10 builds its own llvm-tblgen and uses it. Every devel/llvm* build builds its own llvm-tblgen . Looking in the .log file for a build there are lines containing commands that start out with (from my example devel/llvm10 build context): /wrkdirs/usr/ports/devel/llvm10/work/.build/bin/llvm-tblgen Before any of those, there are commands associated with building that bin/llvm-tblgen . === Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar)