Re: armv7 lang/gcc12 "no bootstrap" build via system clang 15.0.7 based poudriere build ends up stuck in a small loop
Date: Tue, 07 Mar 2023 11:43:53 UTC
On Mar 7, 2023, at 03:12, Lorenzo Salvadore <developer@lorenzosalvadore.it> wrote: > > ------- Original Message ------- > On Tuesday, March 7th, 2023 at 11:26 AM, Mark Millard <marklmi@yahoo.com> wrote: > > >> >> >> Below is a small example C source showing the clang 15+ armv7 >> problem that leads to the unbounded looping in later code in >> the lang/gcc12+ builds: a data structure is mis-initialized, >> breaking its invariant properties used by the later code >> structure. >> >> # more partition.c >> // Minor varation of part of some gcc source code! >> >> // For system-clang 15: cc -g -O2 partition.c ; ./a.out >> // For devel/llvm16: clang16 -g -O2 partition.c ; ./a.out >> >> #include <stdio.h> >> >> >> #define NUM_ELEMENTS 32 >> >> struct partition_elem >> { >> struct partition_elem* next; >> int class_element; >> unsigned class_count; >> }; >> >> typedef struct partition_def >> { >> int num_elements; >> struct partition_elem elements[NUM_ELEMENTS]; >> } *partition; >> >> struct partition_def partition_storage; >> >> partition >> partition_new (int num_elements) >> { >> int e; >> >> if (NUM_ELEMENTS < num_elements) num_elements = NUM_ELEMENTS; >> >> partition part= &partition_storage; >> part->num_elements = num_elements; >> >> for (e = 0; e < num_elements; ++e) >> { >> part->elements[e].class_element = e; >> >> part->elements[e].next = &(part->elements[e]); >> >> part->elements[e].class_count = 1; >> >> } >> >> for (e = 0; e < num_elements; ++e) >> printf("%d: %p : next?: %p\n",e,(void*)&part->elements[e],(void*)part->elements[e].next); >> >> >> return part; >> } >> >> int main(void) >> { >> partition part; >> part= partition_new(NUM_ELEMENTS); >> >> return !part; >> } >> >> In the output below, note the blocks of 4 "next" >> values that do not change. Each should match the >> earlier hexadecimal value on the same line: point >> back to same element of the array. 3 of 4 do not. >> >> # cc -g -O2 partition.c >> # ./a.out >> 0: 0x40a84 : next?: 0x40a84 >> 1: 0x40a90 : next?: 0x40a84 >> 2: 0x40a9c : next?: 0x40a84 >> 3: 0x40aa8 : next?: 0x40a84 >> 4: 0x40ab4 : next?: 0x40ab4 >> 5: 0x40ac0 : next?: 0x40ab4 >> 6: 0x40acc : next?: 0x40ab4 >> 7: 0x40ad8 : next?: 0x40ab4 >> 8: 0x40ae4 : next?: 0x40ae4 >> 9: 0x40af0 : next?: 0x40ae4 >> 10: 0x40afc : next?: 0x40ae4 >> 11: 0x40b08 : next?: 0x40ae4 >> 12: 0x40b14 : next?: 0x40b14 >> 13: 0x40b20 : next?: 0x40b14 >> 14: 0x40b2c : next?: 0x40b14 >> 15: 0x40b38 : next?: 0x40b14 >> 16: 0x40b44 : next?: 0x40b44 >> 17: 0x40b50 : next?: 0x40b44 >> 18: 0x40b5c : next?: 0x40b44 >> 19: 0x40b68 : next?: 0x40b44 >> 20: 0x40b74 : next?: 0x40b74 >> 21: 0x40b80 : next?: 0x40b74 >> 22: 0x40b8c : next?: 0x40b74 >> 23: 0x40b98 : next?: 0x40b74 >> 24: 0x40ba4 : next?: 0x40ba4 >> 25: 0x40bb0 : next?: 0x40ba4 >> 26: 0x40bbc : next?: 0x40ba4 >> 27: 0x40bc8 : next?: 0x40ba4 >> 28: 0x40bd4 : next?: 0x40bd4 >> 29: 0x40be0 : next?: 0x40bd4 >> 30: 0x40bec : next?: 0x40bd4 >> 31: 0x40bf8 : next?: 0x40bd4 >> >> Turns out that the -O2 is important: no other that I >> tried got the problem, including -O3 not getting the >> problem. lang/gcc12+ builds happen to use -O2 , at >> least in my environment. >> >> -g is not required for the problem. > > This last point about optimization is interesting. > It is just a guess, but maybe when you enable bootstrap > in lang/gcc12 you build the first compiler without > optimization, while if you disable it you do use -O2. The bootstrap sequence does not build a full, general-purpose C compiler via clang (or whatever), just something simpler that is enough to build the next stage. So more than the just the optimization level likely contributes to why bootstrap builds still work. > I have taken your example C code and tested it in > FreeBSD amd64 and in a virtual machine running Linux > (OpenSuse) amd64: I have got the same failure > in both cases. I used clang15. So the bug does not > depend on the OS nor on the architecture. Thanks for the Linux tests. While I'm not well set up for building gcc (much less in unusual ways), I do have enough context/knowledge to test my simple test on aarch64 Fedora. You saved me the effort. Although, may be I should check independently, given the below. But on FreeBSD but not for armv7: aarch64 FreeBSD system-clang 15 worked fine: cc -g -O2 partition.c ; ./a.out 0: 0x230d00 : next?: 0x230d00 1: 0x230d10 : next?: 0x230d10 2: 0x230d20 : next?: 0x230d20 3: 0x230d30 : next?: 0x230d30 4: 0x230d40 : next?: 0x230d40 5: 0x230d50 : next?: 0x230d50 6: 0x230d60 : next?: 0x230d60 7: 0x230d70 : next?: 0x230d70 8: 0x230d80 : next?: 0x230d80 9: 0x230d90 : next?: 0x230d90 10: 0x230da0 : next?: 0x230da0 11: 0x230db0 : next?: 0x230db0 12: 0x230dc0 : next?: 0x230dc0 13: 0x230dd0 : next?: 0x230dd0 14: 0x230de0 : next?: 0x230de0 15: 0x230df0 : next?: 0x230df0 16: 0x230e00 : next?: 0x230e00 17: 0x230e10 : next?: 0x230e10 18: 0x230e20 : next?: 0x230e20 19: 0x230e30 : next?: 0x230e30 20: 0x230e40 : next?: 0x230e40 21: 0x230e50 : next?: 0x230e50 22: 0x230e60 : next?: 0x230e60 23: 0x230e70 : next?: 0x230e70 24: 0x230e80 : next?: 0x230e80 25: 0x230e90 : next?: 0x230e90 26: 0x230ea0 : next?: 0x230ea0 27: 0x230eb0 : next?: 0x230eb0 28: 0x230ec0 : next?: 0x230ec0 29: 0x230ed0 : next?: 0x230ed0 30: 0x230ee0 : next?: 0x230ee0 31: 0x230ef0 : next?: 0x230ef0 amd64 FreeBSD system-clang 15 worked fine: # cc -g -O2 partition.c ; ./a.out 0: 0x203ca0 : next?: 0x203ca0 1: 0x203cb0 : next?: 0x203cb0 2: 0x203cc0 : next?: 0x203cc0 3: 0x203cd0 : next?: 0x203cd0 4: 0x203ce0 : next?: 0x203ce0 5: 0x203cf0 : next?: 0x203cf0 6: 0x203d00 : next?: 0x203d00 7: 0x203d10 : next?: 0x203d10 8: 0x203d20 : next?: 0x203d20 9: 0x203d30 : next?: 0x203d30 10: 0x203d40 : next?: 0x203d40 11: 0x203d50 : next?: 0x203d50 12: 0x203d60 : next?: 0x203d60 13: 0x203d70 : next?: 0x203d70 14: 0x203d80 : next?: 0x203d80 15: 0x203d90 : next?: 0x203d90 16: 0x203da0 : next?: 0x203da0 17: 0x203db0 : next?: 0x203db0 18: 0x203dc0 : next?: 0x203dc0 19: 0x203dd0 : next?: 0x203dd0 20: 0x203de0 : next?: 0x203de0 21: 0x203df0 : next?: 0x203df0 22: 0x203e00 : next?: 0x203e00 23: 0x203e10 : next?: 0x203e10 24: 0x203e20 : next?: 0x203e20 25: 0x203e30 : next?: 0x203e30 26: 0x203e40 : next?: 0x203e40 27: 0x203e50 : next?: 0x203e50 28: 0x203e60 : next?: 0x203e60 29: 0x203e70 : next?: 0x203e70 30: 0x203e80 : next?: 0x203e80 31: 0x203e90 : next?: 0x203e90 (The systems were all built from copies of the same FreeBSD source code.) > However, my results have a difference from yours: > in my case tests fail with any level of optimization. I get the same sort of aarach64 and amd64 results for the other optimization levels that I tried: no problems. > At this point, I would say that the issue is in clang. Yep, but I've no evidence of problems but for targeting armv7 via -O2 use --only tested on FreeBSD. You may have more general information than I do at this point. > I think you should file a bug upstream. I'll leave it up to Brooks if he wants to do the initial upstream activity. He should be well recognized and would likely be the one dealing with the later activity tied to getting a fix in place for FreeBSD. === Mark Millard marklmi at yahoo.com