Re: armv7-on-aarch64 stuck at urdlck

From: Konstantin Belousov <kib_at_freebsd.org>
Date: Wed, 24 Jul 2024 10:24:09 UTC
On Tue, Jul 23, 2024 at 08:11:13PM +0000, John F Carr wrote:
> On Jul 23, 2024, at 13:46, Michal Meloun <meloun.michal@gmail.com> wrote:
> > 
> > On 23.07.2024 11:36, Konstantin Belousov wrote:
> >> On Tue, Jul 23, 2024 at 09:53:41AM +0200, Michal Meloun wrote:
> >>> The good news is that I'm finally able to generate a working/locking
> >>> test case.  The culprit (at least for me) is if "-mcpu" is used when
> >>> compiling libthr (e.g. indirectly injected via CPUTYPE in /etc/make.conf).
> >>> If it is not used, libthr is broken (regardless of -O level or debug/normal
> >>> build), but -mcpu=cortex-a15 will always produce a working libthr.
> >> I think this is very significant progress.
> >> Do you plan to drill down more to see what is going on?
> > 
> > So the problem is now clear, and I fear it may apply to other architectures as well.
> > dlopen_object() (from rtld_elf),
> > https://cgit.freebsd.org/src/tree/libexec/rtld-elf/rtld.c#n3766,
> > holds the rtld_bind_lock write lock for almost the entire time a new library is loaded.
> > If the code uses a yet unresolved symbol to load the library, the rtl_bind() function attempts to get read lock of  rtld_bind_lock and a deadlock occurs.
> > 
> > In this case, it round_up() in _thr_stack_fix_protection,
> > https://cgit.freebsd.org/src/tree/lib/libthr/thread/thr_stack.c#n136.
> > Issued by __aeabi_uidiv (since not all armv7 processors support HW divide).
> > 
> > Unfortunately, I'm not sure how to fix it.  The compiler can emit __aeabi_<> in any place, and I'm not sure if it can resolve all the symbols used by rtld_eld and libthr beforehand.
> > 
> > 
> > Michal
> > 
> 
> In this case (but not for all _aeabi_ functions) we can avoid division
> as long as page size is a power of 2.
> 
> The function is
> 
>   static inline size_t
>   round_up(size_t size)
>   {
>   	if (size % _thr_page_size != 0)
>   		size = ((size / _thr_page_size) + 1) *
>   		    _thr_page_size;
>   	return size;
>   }
> 
> The body can be condensed to
> 
>   return (size + _thr_page_size - 1) & ~(_thr_page_size - 1);
> 
> This is shorter in both lines of code and instruction bytes.

Lets not allow this to be lost.  Could anybody confirm that the patch
below fixes the issue?

commit d560f4f6690a48476565278fd07ca131bf4eeb3c
Author: Konstantin Belousov <kib@FreeBSD.org>
Date:   Wed Jul 24 13:17:55 2024 +0300

    rtld: avoid division in __thr_map_stacks_exec()
    
    The function is called by rtld with the rtld bind lock write-locked,
    when fixing the stack permission during dso load.  Not every ARMv7 CPU
    supports the div, which causes the recursive entry into rtld to resolve
    the  __aeabi_uidiv symbol, causing self-lock.
    
    Workaround the problem by using roundup2() instead of open-coding less
    efficient formula.
    
    Diagnosed by:   mmel
    Based on submission by: John F Carr <jfc@mit.edu>
    Sponsored by:   The FreeBSD Foundation
    MFC after:      1 week

diff --git a/lib/libthr/thread/thr_stack.c b/lib/libthr/thread/thr_stack.c
index f576a2c04104..d249bb5606fd 100644
--- a/lib/libthr/thread/thr_stack.c
+++ b/lib/libthr/thread/thr_stack.c
@@ -126,10 +126,7 @@ static char *last_stack = NULL;
 static inline size_t
 round_up(size_t size)
 {
-	if (size % _thr_page_size != 0)
-		size = ((size / _thr_page_size) + 1) *
-		    _thr_page_size;
-	return size;
+	return (roundup2(size, _thr_page_size));
 }
 
 void