Re: armv7-on-aarch64 stuck at urdlck
- Reply: Konstantin Belousov : "Re: armv7-on-aarch64 stuck at urdlck"
- In reply to: Konstantin Belousov : "Re: armv7-on-aarch64 stuck at urdlck"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Wed, 24 Jul 2024 10:34:57 UTC
On 24.07.2024 12:24, Konstantin Belousov wrote: > On Tue, Jul 23, 2024 at 08:11:13PM +0000, John F Carr wrote: >> On Jul 23, 2024, at 13:46, Michal Meloun <meloun.michal@gmail.com> wrote: >>> >>> On 23.07.2024 11:36, Konstantin Belousov wrote: >>>> On Tue, Jul 23, 2024 at 09:53:41AM +0200, Michal Meloun wrote: >>>>> The good news is that I'm finally able to generate a working/locking >>>>> test case. The culprit (at least for me) is if "-mcpu" is used when >>>>> compiling libthr (e.g. indirectly injected via CPUTYPE in /etc/make.conf). >>>>> If it is not used, libthr is broken (regardless of -O level or debug/normal >>>>> build), but -mcpu=cortex-a15 will always produce a working libthr. >>>> I think this is very significant progress. >>>> Do you plan to drill down more to see what is going on? >>> >>> So the problem is now clear, and I fear it may apply to other architectures as well. >>> dlopen_object() (from rtld_elf), >>> https://cgit.freebsd.org/src/tree/libexec/rtld-elf/rtld.c#n3766, >>> holds the rtld_bind_lock write lock for almost the entire time a new library is loaded. >>> If the code uses a yet unresolved symbol to load the library, the rtl_bind() function attempts to get read lock of rtld_bind_lock and a deadlock occurs. >>> >>> In this case, it round_up() in _thr_stack_fix_protection, >>> https://cgit.freebsd.org/src/tree/lib/libthr/thread/thr_stack.c#n136. >>> Issued by __aeabi_uidiv (since not all armv7 processors support HW divide). >>> >>> Unfortunately, I'm not sure how to fix it. The compiler can emit __aeabi_<> in any place, and I'm not sure if it can resolve all the symbols used by rtld_eld and libthr beforehand. >>> >>> >>> Michal >>> >> >> In this case (but not for all _aeabi_ functions) we can avoid division >> as long as page size is a power of 2. >> >> The function is >> >> static inline size_t >> round_up(size_t size) >> { >> if (size % _thr_page_size != 0) >> size = ((size / _thr_page_size) + 1) * >> _thr_page_size; >> return size; >> } >> >> The body can be condensed to >> >> return (size + _thr_page_size - 1) & ~(_thr_page_size - 1); >> >> This is shorter in both lines of code and instruction bytes. > > Lets not allow this to be lost. Could anybody confirm that the patch > below fixes the issue? > > commit d560f4f6690a48476565278fd07ca131bf4eeb3c > Author: Konstantin Belousov <kib@FreeBSD.org> > Date: Wed Jul 24 13:17:55 2024 +0300 > > rtld: avoid division in __thr_map_stacks_exec() > > The function is called by rtld with the rtld bind lock write-locked, > when fixing the stack permission during dso load. Not every ARMv7 CPU > supports the div, which causes the recursive entry into rtld to resolve > the __aeabi_uidiv symbol, causing self-lock. > > Workaround the problem by using roundup2() instead of open-coding less > efficient formula. > > Diagnosed by: mmel > Based on submission by: John F Carr <jfc@mit.edu> > Sponsored by: The FreeBSD Foundation > MFC after: 1 week > For final resolving of deadlocks, after a full day of digging, I'm very much incline of adding -znow to the linker flags for libthr.so (and maybe also for ld-elf.so). The runtime cost of resolving all symbols at startup is very low. Direct pre-solving in _thr_rtld_init() is problematic for the _aeabi_* symbols, since they don't have an official C prototypes, and some are not compatible with C calling conventions. Warner, Konstantin, could you please comment on this? Michal