Re: armv7-on-aarch64 stuck at urdlck

From: <meloun.michal_at_gmail.com>
Date: Mon, 22 Jul 2024 16:41:40 UTC

On 22.07.2024 18:26, Mark Millard wrote:
> On Jul 22, 2024, at 06:40, Michal Meloun <meloun.michal@gmail.com> wrote:
> 
>> On 22.07.2024 13:46, Mark Millard wrote:
>>> On Jul 21, 2024, at 22:59, Michal Meloun <meloun.michal@gmail.com> wrote:
>>>> I don't want to hijack the original thread, so I'm replying in a new one.
>>>>
>>>> My tegra track current, has been running 24/7 by building kernel/world and kde5 in a loop for a few years now. But I have never encountered the aforementioned lockup in native armv7.
>>>>
>>>> I have seen usermode mutex lockup in arm32 jail on aarch64, but only very rarely (once a month or so) and all my attempts to reproduce it in a more deterministic way have failed. Also, I don't think I've ever seen this with the debug version of libc.
>>>>
>>>> Unfortunately I also failed to reproduce given lockup using dlopen_test.c, neither on native armv7 or arm32 jail.
>>>>
>>>> Michal Meloun
>>> What is the output of:
>>> # readelf -a /libexec/ld-elf.so.1 | grep -E "(^[^ 0-9]|.*_rtld_get_stack_prot)"
>>> in your armv7 context(s)? Does it include for likes of:
>>> QUOTE
>>> Symbol table '.symtab' contains 911 entries:
>>>   903: 000000000001b9ac    16 FUNC    GLOBAL DEFAULT   11 _rtld_get_stack_prot
>>> END QUOTE
>>> `
>>> vs. not?
>>> Note that the "debug version of libc" being involved likely means that
>>> DEBUG_FLAGS was defined. That in turn likely means that strip is not
>>> being used. In such a case, I expect that the .symtab entry for
>>> _rtld_get_stack_prot (and more) exists for such a context.
>> At tis time, I have standard (thus stripped, non-debug) version of runtime linker library installed. Thus it have only dynamic relocation record for _rtld_get_stack_prot:
>>
>> root@tegra124:~/dlopen_test # readelf -a /libexec/ld-elf.so.1 | grep -E "(^[^ 0-9]|.*_rtld_get_stack_prot)"
>> ELF Header:
>> Elf file type is DYN (Shared object file)
>> Entry point 0x1449c
>> There are 10 program headers, starting at offset 52
>> Program Headers:
>> There are 23 section headers, starting at offset 0x1a448:
>> Section Headers:
>> Key to Flags:
>> Dynamic section at offset 0x19fa4 contains 15 entries:
>> Relocation section (.rel.dyn):
>> r_offset r_info   r_type              st_value st_name
>> Symbol table '.dynsym' contains 27 entries:
>>      5: 000000000001ba0c    16 FUNC    GLOBAL DEFAULT   12 _rtld_get_stack_prot@@FBSDprivate_1.0 (11)
>> Notes at offset 0x00000174 with length 0x00000018:
>> Histogram for bucket list length (total of 6 buckets):
>> Histogram for bucket list length (total of 27 buckets):
>> Version symbol section (.gnu.version):
>> Version definition section (.gnu.version_d):
>> Attribute Section: aeabi
>>
>> ------
>>
>> root@tegra124:~/dlopen_test # ./dlopen_test
>> root@tegra124:~/dlopen_test #
> 
> Just to be sure . . .
> 
> Did you at some point "pkg install cairo" (or analogous) so that
> the following (or some vintage) were in place?
> 
> # ls -lodT /usr/local/lib/libcairo.so*
> lrwxr-xr-x  1 root wheel -      21 Apr 29 19:45:15 2024 /usr/local/lib/libcairo.so -> libcairo.so.2.11704.0
> lrwxr-xr-x  1 root wheel -      21 Apr 29 19:45:15 2024 /usr/local/lib/libcairo.so.2 -> libcairo.so.2.11704.0
> -rwxr-xr-x  1 root wheel - 1118272 Apr 29 19:45:15 2024 /usr/local/lib/libcairo.so.2.11704.0
> 
> # file /usr/local/lib/libcairo.so.2.11704.0
> /usr/local/lib/libcairo.so.2.11704.0: ELF 32-bit LSB shared object, ARM, EABI5 version 1 (FreeBSD), dynamically linked, for FreeBSD 15.0 (1500018), stripped
> 
> (Installing cairo would also install other things it needs.)
> 
> For the failing contexts, the a.out from dlopen_test.c will only
> hang if the library (and what it requires) is actually there to
> load.
> 
Yep, i have cairo installed (but compiled from sources, not installed by 
pkg). And i have verified that dlopen() return success.
In the meantime I tried all combinations (debud/stripped) of ld_elf and 
libthr. All combinations work without problems on the native system and 
in arm323 jail.
Btw, gdb has long had problems with stepping inside ld_elf. It's better 
to run the test program without it and connect to the test program to 
get the "correct" stack trace.

Michal Meloun