Re: armv7-on-aarch64 stuck at urdlck

From: mmel@freebsd.org <meloun.michal_at_gmail.com>
Date: Wed, 24 Jul 2024 10:34:57 UTC

On 24.07.2024 12:24, Konstantin Belousov wrote:
> On Tue, Jul 23, 2024 at 08:11:13PM +0000, John F Carr wrote:
>> On Jul 23, 2024, at 13:46, Michal Meloun <meloun.michal@gmail.com> wrote:
>>>
>>> On 23.07.2024 11:36, Konstantin Belousov wrote:
>>>> On Tue, Jul 23, 2024 at 09:53:41AM +0200, Michal Meloun wrote:
>>>>> The good news is that I'm finally able to generate a working/locking
>>>>> test case.  The culprit (at least for me) is if "-mcpu" is used when
>>>>> compiling libthr (e.g. indirectly injected via CPUTYPE in /etc/make.conf).
>>>>> If it is not used, libthr is broken (regardless of -O level or debug/normal
>>>>> build), but -mcpu=cortex-a15 will always produce a working libthr.
>>>> I think this is very significant progress.
>>>> Do you plan to drill down more to see what is going on?
>>>
>>> So the problem is now clear, and I fear it may apply to other architectures as well.
>>> dlopen_object() (from rtld_elf),
>>> https://cgit.freebsd.org/src/tree/libexec/rtld-elf/rtld.c#n3766,
>>> holds the rtld_bind_lock write lock for almost the entire time a new library is loaded.
>>> If the code uses a yet unresolved symbol to load the library, the rtl_bind() function attempts to get read lock of  rtld_bind_lock and a deadlock occurs.
>>>
>>> In this case, it round_up() in _thr_stack_fix_protection,
>>> https://cgit.freebsd.org/src/tree/lib/libthr/thread/thr_stack.c#n136.
>>> Issued by __aeabi_uidiv (since not all armv7 processors support HW divide).
>>>
>>> Unfortunately, I'm not sure how to fix it.  The compiler can emit __aeabi_<> in any place, and I'm not sure if it can resolve all the symbols used by rtld_eld and libthr beforehand.
>>>
>>>
>>> Michal
>>>
>>
>> In this case (but not for all _aeabi_ functions) we can avoid division
>> as long as page size is a power of 2.
>>
>> The function is
>>
>>    static inline size_t
>>    round_up(size_t size)
>>    {
>>    	if (size % _thr_page_size != 0)
>>    		size = ((size / _thr_page_size) + 1) *
>>    		    _thr_page_size;
>>    	return size;
>>    }
>>
>> The body can be condensed to
>>
>>    return (size + _thr_page_size - 1) & ~(_thr_page_size - 1);
>>
>> This is shorter in both lines of code and instruction bytes.
> 
> Lets not allow this to be lost.  Could anybody confirm that the patch
> below fixes the issue?
> 
> commit d560f4f6690a48476565278fd07ca131bf4eeb3c
> Author: Konstantin Belousov <kib@FreeBSD.org>
> Date:   Wed Jul 24 13:17:55 2024 +0300
> 
>      rtld: avoid division in __thr_map_stacks_exec()
>      
>      The function is called by rtld with the rtld bind lock write-locked,
>      when fixing the stack permission during dso load.  Not every ARMv7 CPU
>      supports the div, which causes the recursive entry into rtld to resolve
>      the  __aeabi_uidiv symbol, causing self-lock.
>      
>      Workaround the problem by using roundup2() instead of open-coding less
>      efficient formula.
>      
>      Diagnosed by:   mmel
>      Based on submission by: John F Carr <jfc@mit.edu>
>      Sponsored by:   The FreeBSD Foundation
>      MFC after:      1 week
> 




For final resolving of deadlocks, after a full day of digging, I'm very 
much incline  of adding -znow to the linker flags for libthr.so (and 
maybe also for ld-elf.so). The runtime cost of resolving all symbols at 
startup is very low. Direct pre-solving in _thr_rtld_init() is 
problematic for the _aeabi_* symbols, since they don't have an official 
C prototypes, and some are not compatible with C calling conventions.

Warner, Konstantin, could you please comment on this?


Michal