Re: armv7 lang/gcc12 "no bootstrap" build via system clang 15.0.7 based poudriere build ends up stuck in a small loop

From: Mark Millard <marklmi_at_yahoo.com>
Date: Mon, 06 Mar 2023 18:13:42 UTC
[Backtrace added.]

> On Mar 6, 2023, at 09:46, Mark Millard <marklmi@yahoo.com> wrote:
> 
> [Some more context notes.]
> 
> On Mar 6, 2023, at 09:12, Mark Millard <marklmi@yahoo.com> wrote:
> 
>> On Mar 6, 2023, at 08:37, Lorenzo Salvadore <developer@lorenzosalvadore.it> wrote:
>> 
>>> ------- Original Message -------
>>> On Monday, March 6th, 2023 at 9:46 AM, Mark Millard <marklmi@yahoo.com> wrote:
>>> 
>>> 
>>>> 
>>>> 
>>>> Under main that has clang 15.0.7, I've had to locally
>>>> switch to using the likes of:
>>>> 
>>>> OPTIONS_DEFAULT_armv7=STANDARD_BOOTSTRAP
>>>> 
>>>> (to express it in Makefile terms) for lang/gcc12 in order
>>>> to avoid the following.
>>>> 
>>>> The no bootstrap build ends up stuck in small loop in partition_union
>>>> (in cc1):
>>>> 
>>>> (gdb) info threads
>>>> Id Target Id Frame
>>>> * 1 LWP 632886 of process 27787 0x016eb82c in partition_union ()
>>>> (gdb) bt
>>>> #0 0x016eb82c in partition_union ()
>>>> #1 0x0133e6ec in var_union(_var_map*, tree_node*, tree_node*) ()
>>>> #2 0x013218e4 in attempt_coalesce(_var_map*, ssa_conflicts*, int, int, __sFILE*) ()
>>>> #3 0x013203d0 in coalesce_ssa_name(_var_map*) ()
>>>> #4 0x012c66b4 in rewrite_out_of_ssa(ssaexpand*) ()
>>>> #5 0x0082c094 in (anonymous namespace)::pass_expand::execute(function*) ()
>>>> #6 0x00fd6ff0 in execute_one_pass(opt_pass*) ()
>>>> #7 0x00fd8380 in execute_pass_list_1(opt_pass*) ()
>>>> #8 0x00fc6df0 in execute_pass_list(function*, opt_pass*) ()
>>>> #9 0x00880c20 in cgraph_node::expand() ()
>>>> #10 0x00882d10 in symbol_table::compile() ()
>>>> #11 0x00883454 in symbol_table::finalize_compilation_unit() ()
>>>> #12 0x0120e204 in compile_file() ()
>>>> #13 0x0120d9d4 in toplev::main(int, char**) ()
>>>> #14 0x01646c28 in main ()
>>>> (gdb) finish
>>>> Run till exit from #0 0x016eb82c in partition_union ()
>>>> 
>>>> It never exits. I've walked through the short loop that ends
>>>> up with data that leads to no progress: bne always taken and
>>>> reaches a status of no change in the values involved happens
>>>> in the loop.
>>>> 
>>>> truss shows no output and no subroutines are called in the
>>>> few instruction long loop.
>>>> 
>>>> I ran multiple tests of "no bootstrap" and all failed the
>>>> same way.
>>>> 
>>>> Such would not be a good thing for the FreeBSD armv7 package
>>>> build server.
>>>> 
>>>> Also seen via lldb:
>>>> 
>>>> (lldb) bt
>>>> * thread #1, name = 'cc1', stop reason = signal SIGSTOP
>>>> * frame #0: 0x016eb82c cc1`partition_union + 152 frame #1: 0x0133e6ec cc1`var_union(_var_map*, tree_node*, tree_node*) + 104
>>>> frame #2: 0x013218e4 cc1`attempt_coalesce(_var_map*, ssa_conflicts*, int, int, __sFILE*) + 508 frame #3: 0x013203d0 cc1`coalesce_ssa_name(_var_map*) + 7240
>>>> frame #4: 0x012c66b4 cc1`rewrite_out_of_ssa(ssaexpand*) + 2020 frame #5: 0x0082c094 cc1`(anonymous namespace)::pass_expand::execute(function*) + 68
>>>> frame #6: 0x00fd6ff0 cc1`execute_one_pass(opt_pass*) + 616 frame #7: 0x00fd8380 cc1`execute_pass_list_1(opt_pass*) + 44
>>>> frame #8: 0x00fc6df0 cc1`execute_pass_list(function*, opt_pass*) + 40 frame #9: 0x00880c20 cc1`cgraph_node::expand() + 324
>>>> frame #10: 0x00882d10 cc1`symbol_table::compile() + 3860 frame #11: 0x00883454 cc1`symbol_table::finalize_compilation_unit() + 300
>>>> frame #12: 0x0120e204 cc1`compile_file() + 236 frame #13: 0x0120d9d4 cc1`toplev::main(int, char**) + 7028
>>>> frame #14: 0x01646c28 cc1`main + 48 frame #15: 0x004ad3f0 cc1`__start(argc=31, argv=0xffffadec, env=0xffffae6c, ps_strings=<unavailable>, obj=0x4181e004, cleanup=0x417ed4d8) at crt1_c.c:92:7
>>>> 
>>>> 
>>>> 
>>>> The armv7 STANDARD_BOOTSTRAP change lead to it reaching completion.
>>>> 
>>>> But the "no bootstrap" issue suggests that system-clang 15.0.7
>>>> has a problem for armv7 targeting. (I've not seen problems for
>>>> targeting aarch64 or amd64.)
>>>> 
>>>> 
>>>> For reference:
>>>> 
>>>> # uname -apKU
>>>> FreeBSD CA72_16Gp_ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #88 main-n261230-e78dc78e517a-dirty: Wed Mar 1 16:17:45 PST 2023 root@CA72_16Gp_ZFS:/usr/obj/BUILDs/main-CA72-nodbg-clang/usr/main-src/arm64.aarch64/sys/GENERIC-NODBG-CA72 arm armv7 1400081 1400081
>>>> 
>>>> via:
>>>> 
>>>> # poudriere jail -l
>>>> JAILNAME VERSION ARCH METHOD TIMESTAMP PATH
>>>> . . .
>>>> main-CA7 14.0-CURRENT arm.armv7 null 2021-06-27 17:58:33 /usr/obj/DESTDIRs/main-CA7-poud
>>>> . . .
>>>> 
>>>> on an aarch64 system, no qemu involved (or even installed):
>>>> 
>>>> # uname -apKU
>>>> FreeBSD CA72_16Gp_ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #88 main-n261230-e78dc78e517a-dirty: Wed Mar 1 16:17:45 PST 2023 root@CA72_16Gp_ZFS:/usr/obj/BUILDs/main-CA72-nodbg-clang/usr/main-src/arm64.aarch64/sys/GENERIC-NODBG-CA72 arm64 aarch64 1400081 1400081
>>>> 
>>>> (It is a 16 Cortex-A72 HoneyComb.)
>>> 
>>> Thanks Mark.
>>> 
>>> I guess cases like this are one of the reasons for bootstrapping existence:
>>> compilation with clang on armv7 probably is not the tipical case, so it
>>> does not work so easily as using GCC on amd64. Good that it works at least
>>> with bootstraping.
>>> 
>>> Now, I would like to suggest a few more experiments:
>> 
>> Some of the below have a partial answer from the fact that
>> the FreeBSD package builder system for armv7 is still
>> running system-clang 14 (main) or 13 (13.1-RELEASE) and
>> does not yet see the problem. (The build server's actual
>> kernel vintage should not be an issue to worry about.)
>> 
>> Nor did it have problems in the past building lang/gcc12.
>> 
>> This is a new issue.
>> 
>>> - does the compilation work without bootstrapping with lang/gcc13-devel?
> 
> I started a lang/gcc13-devel build attempt. It got stuck
> as well while composing this message. I'll recreate and
> look at the backtrace later.

The rerun reproduced the problem. The backtrace was:

(lldb) bt
* thread #1, name = 'cc1', stop reason = signal SIGSTOP
  * frame #0: 0x01f560cc cc1`partition_union + 152
    frame #1: 0x01a17e20 cc1`var_union(_var_map*, tree_node*, tree_node*) + 104
    frame #2: 0x019ecaa4 cc1`attempt_coalesce(_var_map*, ssa_conflicts*, int, int, __sFILE*) + 624
    frame #3: 0x019ea91c cc1`coalesce_ssa_name(_var_map*) + 8100
    frame #4: 0x019609ac cc1`rewrite_out_of_ssa(ssaexpand*) + 2052
    frame #5: 0x00a1f334 cc1`(anonymous namespace)::pass_expand::execute(function*) + 68
    frame #6: 0x01583044 cc1`execute_one_pass(opt_pass*) + 664
    frame #7: 0x015842bc cc1`execute_pass_list_1(opt_pass*) + 44
    frame #8: 0x01572368 cc1`execute_pass_list(function*, opt_pass*) + 40
    frame #9: 0x00a8efa8 cc1`cgraph_node::expand() + 364
    frame #10: 0x00a91404 cc1`symbol_table::compile() + 3244
    frame #11: 0x00a91e20 cc1`symbol_table::finalize_compilation_unit() + 300
    frame #12: 0x0183530c cc1`compile_file() + 236
    frame #13: 0x01834acc cc1`toplev::main(int, char**) + 6716
    frame #14: 0x01e7c998 cc1`main + 48
    frame #15: 0x005a35b0 cc1`__start(argc=31, argv=0xffffabfc, env=0xffffac7c, ps_strings=<unavailable>, obj=0x42094004, cleanup=0x420634d8) at crt1_c.c:92:7

confirming the similar context to the hangup building gcc12.

>>> - does the compilation work without bootstrapping with a higher version
>>> of clang (we have devel/llvm16 in the ports tree, which tracks a pre-release)?

I'll see about forcing lang/gcc13-devel to use devel/llvm16 instead
of system-clang. (Not something I've done before, at least that I
remember.) I do already have devel/llvm16 (rc3) built.

>>> - does the compilation work without bootstrapping on a release version of
>>> FreeBSD?
>> 
>> That is an example were the 13.1 based package builds on the
>> system used for armv7 builds did/does not have problem.
>> 
>> Nor do the main system-clang 14 based builds.
>> 
>>> - does the compilation work without bootstrapping using Linux instead
>>> of FreeBSD?
>> 
>> I'm not well set up for that kind of experiment.
>> 
>>> You might want to open a bug report, but you should try to understand
>>> first what is the component that causes the issue and if replacing anything
>>> with something newer (where the bug might be already fixed) or with
>>> something supported (since FreeBSD CURRENT is under development, we
>>> might have regressions) solves the problem.
>> 
>> It is already known to be a regression compared to
>> system-clang 14 and 13 based builds. No clue yet
>> for llvm16 based. (I've separate notes out about
>> building llvm16 for aarch64 needing a Makefile
>> fix.)
>> 
>>> If you find that the cause is in the FreeBSD GCC port(s), then please
>>> open a bug report on bugzilla so that I can keep track of it and other
>>> users with the same problem can find it there as well.
>>> 
>> 
> 
> When I can, I reference FreeBSD package builder results
> instead of build attempts from my personal environment.
> (But I'd previously used main with system-clang 14 and
> still use releng/13.1 with its system-clang 13 and had
> no problems, just like the armv7 package builder.)
> 
> I've not been building any lang/gcc* for * < 12 in
> a long time. For all I know, gcc11 and before could
> run into the problem. At this stage, the armv7
> package builder never uses system-clang 15 and so
> gives no evidence for any lang/gcc* .
> 




===
Mark Millard
marklmi at yahoo.com