Re: armv7 lang/gcc12 "no bootstrap" build via system clang 15.0.7 based poudriere build ends up stuck in a small loop

From: Lorenzo Salvadore <developer_at_lorenzosalvadore.it>
Date: Tue, 07 Mar 2023 11:12:34 UTC
------- Original Message -------
On Tuesday, March 7th, 2023 at 11:26 AM, Mark Millard <marklmi@yahoo.com> wrote:


> 
> 
> Below is a small example C source showing the clang 15+ armv7
> problem that leads to the unbounded looping in later code in
> the lang/gcc12+ builds: a data structure is mis-initialized,
> breaking its invariant properties used by the later code
> structure.
> 
> # more partition.c
> // Minor varation of part of some gcc source code!
> 
> // For system-clang 15: cc -g -O2 partition.c ; ./a.out
> // For devel/llvm16: clang16 -g -O2 partition.c ; ./a.out
> 
> #include <stdio.h>
> 
> 
> #define NUM_ELEMENTS 32
> 
> struct partition_elem
> {
> struct partition_elem* next;
> int class_element;
> unsigned class_count;
> };
> 
> typedef struct partition_def
> {
> int num_elements;
> struct partition_elem elements[NUM_ELEMENTS];
> } *partition;
> 
> struct partition_def partition_storage;
> 
> partition
> partition_new (int num_elements)
> {
> int e;
> 
> if (NUM_ELEMENTS < num_elements) num_elements = NUM_ELEMENTS;
> 
> partition part= &partition_storage;
> part->num_elements = num_elements;
> 
> for (e = 0; e < num_elements; ++e)
> {
> part->elements[e].class_element = e;
> 
> part->elements[e].next = &(part->elements[e]);
> 
> part->elements[e].class_count = 1;
> 
> }
> 
> for (e = 0; e < num_elements; ++e)
> printf("%d: %p : next?: %p\n",e,(void*)&part->elements[e],(void*)part->elements[e].next);
> 
> 
> return part;
> }
> 
> int main(void)
> {
> partition part;
> part= partition_new(NUM_ELEMENTS);
> 
> return !part;
> }
> 
> In the output below, note the blocks of 4 "next"
> values that do not change. Each should match the
> earlier hexadecimal value on the same line: point
> back to same element of the array. 3 of 4 do not.
> 
> # cc -g -O2 partition.c
> # ./a.out
> 0: 0x40a84 : next?: 0x40a84
> 1: 0x40a90 : next?: 0x40a84
> 2: 0x40a9c : next?: 0x40a84
> 3: 0x40aa8 : next?: 0x40a84
> 4: 0x40ab4 : next?: 0x40ab4
> 5: 0x40ac0 : next?: 0x40ab4
> 6: 0x40acc : next?: 0x40ab4
> 7: 0x40ad8 : next?: 0x40ab4
> 8: 0x40ae4 : next?: 0x40ae4
> 9: 0x40af0 : next?: 0x40ae4
> 10: 0x40afc : next?: 0x40ae4
> 11: 0x40b08 : next?: 0x40ae4
> 12: 0x40b14 : next?: 0x40b14
> 13: 0x40b20 : next?: 0x40b14
> 14: 0x40b2c : next?: 0x40b14
> 15: 0x40b38 : next?: 0x40b14
> 16: 0x40b44 : next?: 0x40b44
> 17: 0x40b50 : next?: 0x40b44
> 18: 0x40b5c : next?: 0x40b44
> 19: 0x40b68 : next?: 0x40b44
> 20: 0x40b74 : next?: 0x40b74
> 21: 0x40b80 : next?: 0x40b74
> 22: 0x40b8c : next?: 0x40b74
> 23: 0x40b98 : next?: 0x40b74
> 24: 0x40ba4 : next?: 0x40ba4
> 25: 0x40bb0 : next?: 0x40ba4
> 26: 0x40bbc : next?: 0x40ba4
> 27: 0x40bc8 : next?: 0x40ba4
> 28: 0x40bd4 : next?: 0x40bd4
> 29: 0x40be0 : next?: 0x40bd4
> 30: 0x40bec : next?: 0x40bd4
> 31: 0x40bf8 : next?: 0x40bd4
> 
> Turns out that the -O2 is important: no other that I
> tried got the problem, including -O3 not getting the
> problem. lang/gcc12+ builds happen to use -O2 , at
> least in my environment.
> 
> -g is not required for the problem.

This last point about optimization is interesting.
It is just a guess, but maybe when you enable bootstrap
in lang/gcc12 you build the first compiler without
optimization, while if you disable it you do use -O2.

I have taken your example C code and tested it in
FreeBSD amd64 and in a virtual machine running Linux
(OpenSuse) amd64: I have got the same failure
in both cases. I used clang15. So the bug does not
depend on the OS nor on the architecture.

However, my results have a difference from yours:
in my case tests fail with any level of optimization.

At this point, I would say that the issue is in clang.
I think you should file a bug upstream.

Thanks,

Lorenzo Salvadore