Re: armv7-on-aarch64 stuck at urdlck: I got a replication of the "ampere2" bulk build hangup problem on a Windows DevKit 2023

From: Mark Millard <marklmi_at_yahoo.com>
Date: Sat, 20 Jul 2024 13:12:56 UTC
On Jul 20, 2024, at 01:57, Konstantin Belousov <kostikbel@gmail.com> wrote:

> [Everything and everybody in Cc: are stripped for good].
> 
> On Fri, Jul 19, 2024 at 10:38:36PM -0700, Mark Millard wrote:
>> 0x201375c0 - 0x2014092c is .bss in /lib/libthr.so.3
>> 
>> (gdb) bt
>> #0  0x201aeec0 in __pthread_map_stacks_exec () from /lib/libc.so.7
>> #1  0x2005d1e4 in ?? () from /libexec/ld-elf.so.1
>> Backtrace stopped: previous frame identical to this frame (corrupt stack?)
>> (gdb) disass
>> Dump of assembler code for function __pthread_map_stacks_exec:
>> => 0x201aeec0 <+0>: ldr r0, [pc, #8] @ 0x201aeed0 <__pthread_map_stacks_exec+16>
>>   0x201aeec4 <+4>: add r0, pc, r0
>>   0x201aeec8 <+8>: ldr r0, [r0, #156] @ 0x9c
>>   0x201aeecc <+12>: bx r0
>>   0x201aeed0 <+16>: andseq r6, r7, r4, lsr #12
>> End of assembler dump.
>> 
> 
> Do the following:
> 1. Rebuild rtld/libc/libthr with the debugging info and no optimization,
>   i.e. ensure that flags are "-O0 -g" or "-Og -g" and not -O2.  See
>   the first comment in libexec/rtld-elf/Makefile for the hint how to
>   do it.
> 2. Reproduce the issue under gdb, and backtrace all threads from userspace.
>   I only need userspace backtrace, not either kernel-side stacks nor
>   the syscall history.

The above will not happen for a while. It will be based on my
personal world/kernel build context that is not a clean context.

> Are you sure that the issue is specific to armv7, might be it takes more
> efforts to reproduce on host native?

I do not claim to know what to vary to make aarch64 used as aarch64
a good context for concluding failure is likely impossible. I only
know for the identified failure contexts for armv7 that aarch64 used
as aarch64 does not fail in any testing so far.

For a native armv7 example context, using /usr/local/lib/libcairo.so.2
from after installing cairo and testing on a Orange Pi+ 2ed Corext-A7
system:

cc -g -std=c11 -pedantic -Wall -pthread dlopen_test.c ; ./a.out

fails as well (a.out hangs in urdlck STATE). The context was:

# uname -apKU
FreeBSD OPiP2E-RPi2v1p1 15.0-CURRENT FreeBSD 15.0-CURRENT main-n270963-609cdb12b962 GENERIC arm armv7 1500019 1500019

from a PkgBase based installation.

Note: Of the 3 .so libraries referenced in dlopen.c the
/usr/local/lib/libcairo.so.2 one indirectly loads the smallest
number of other libraries. So I tend to prefer to test just it
when that case fails.


The original problem has never been observed on ampere2 for main-arm64-default.
(So: aarch64 as aarch64.) Nor in my testing on various aarch64 systems used as
aarch64 (Cortext-A72, Cortex-A76, Cortex-A78C and Cortex-X1C mix). Nor has
aarch64 dlopen_test.c ever failed in such testing contexts.

The original problem always reproduced on ampere2 for main-armv7-default.
(So aarch64 as armv7.) True as of back in late Feb and later. It always
reproduces in my chroot to armv7 testing on various aarch64 systems
(Cortext-A72, Cortex-A76, Cortex-A78C and Cortex-X1C mix). That includes
the armv7 dlopen.c testing with one of the 3 .so's reported in the source
code.

The original problem is seen via "dot -c" during graphviz installation.
dlopen_test.c gets the same type of failure so far for all armv7 execution
contexts that I've tried.

For reference:

# more dlopen_test.c 
// FAILS:
// cc -g -std=c11 -pedantic -Wall -pthread dlopen_test.c ; ./a.out

// Works:
// cc -g -std=c11 -pedantic -Wall          dlopen_test.c ; ./a.out

#include <dlfcn.h>

int main(void)
{
    // ANY OF THE FOLLOWING FAIL with -pthread specified:
    // dlopen("/usr/local/lib/graphviz/libgvplugin_gd.so.6.0.0",RTLD_LAZY);
    // dlopen("/usr/local/lib/libpangocairo-1.0.so.0",RTLD_LAZY);
    dlopen("/usr/local/lib/libcairo.so.2",RTLD_LAZY);
}

Note: Successful "dot -c" activity during graphviz install activity
includes loading those 3 libraries, possibly indirectly for some of
the 3. The failing armv7 examples hang during a dlopen of:

/usr/local/lib/graphviz/libgvplugin_gd.so.6.0.0


===
Mark Millard
marklmi at yahoo.com