Re: armv7-on-aarch64 stuck at urdlck

From: Warner Losh <imp_at_bsdimp.com>
Date: Tue, 23 Jul 2024 20:54:46 UTC
On Tue, Jul 23, 2024 at 2:11 PM John F Carr <jfc@mit.edu> wrote:

> On Jul 23, 2024, at 13:46, Michal Meloun <meloun.michal@gmail.com> wrote:
> >
> > On 23.07.2024 11:36, Konstantin Belousov wrote:
> >> On Tue, Jul 23, 2024 at 09:53:41AM +0200, Michal Meloun wrote:
> >>> The good news is that I'm finally able to generate a working/locking
> >>> test case.  The culprit (at least for me) is if "-mcpu" is used when
> >>> compiling libthr (e.g. indirectly injected via CPUTYPE in
> /etc/make.conf).
> >>> If it is not used, libthr is broken (regardless of -O level or
> debug/normal
> >>> build), but -mcpu=cortex-a15 will always produce a working libthr.
> >> I think this is very significant progress.
> >> Do you plan to drill down more to see what is going on?
> >
> > So the problem is now clear, and I fear it may apply to other
> architectures as well.
> > dlopen_object() (from rtld_elf),
> > https://cgit.freebsd.org/src/tree/libexec/rtld-elf/rtld.c#n3766,
> > holds the rtld_bind_lock write lock for almost the entire time a new
> library is loaded.
> > If the code uses a yet unresolved symbol to load the library, the
> rtl_bind() function attempts to get read lock of  rtld_bind_lock and a
> deadlock occurs.
> >
> > In this case, it round_up() in _thr_stack_fix_protection,
> > https://cgit.freebsd.org/src/tree/lib/libthr/thread/thr_stack.c#n136.
> > Issued by __aeabi_uidiv (since not all armv7 processors support HW
> divide).
> >
> > Unfortunately, I'm not sure how to fix it.  The compiler can emit
> __aeabi_<> in any place, and I'm not sure if it can resolve all the symbols
> used by rtld_eld and libthr beforehand.
> >
> >
> > Michal
> >
>
> In this case (but not for all _aeabi_ functions) we can avoid division
> as long as page size is a power of 2.
>
> The function is
>
>   static inline size_t
>   round_up(size_t size)
>   {
>         if (size % _thr_page_size != 0)
>                 size = ((size / _thr_page_size) + 1) *
>                     _thr_page_size;
>         return size;
>   }
>
> The body can be condensed to
>
>   return (size + _thr_page_size - 1) & ~(_thr_page_size - 1);
>
> This is shorter in both lines of code and instruction bytes.
>

I like this change...

But we do need to fix the deadlocks... They seem to be more likely
when building in bsd-user emulation...

Warner