Re: armv7-on-aarch64 stuck at urdlck

From: Warner Losh <imp_at_bsdimp.com>
Date: Wed, 24 Jul 2024 20:20:23 UTC
On Wed, Jul 24, 2024 at 11:34 AM mmel@freebsd.org <meloun.michal@gmail.com>
wrote:

>
>
> On 24.07.2024 17:47, Konstantin Belousov wrote:
> > On Wed, Jul 24, 2024 at 01:07:39PM +0000, John F Carr wrote:
> >>
> >>
> >>> On Jul 24, 2024, at 06:50, Konstantin Belousov <kib@freebsd.org>
> wrote:
> >>>
> >>> On Wed, Jul 24, 2024 at 12:34:57PM +0200, mmel@freebsd.org wrote:
> >>>>
> >>>>
> >>>> On 24.07.2024 12:24, Konstantin Belousov wrote:
> >>>>> On Tue, Jul 23, 2024 at 08:11:13PM +0000, John F Carr wrote:
> >>>>>> On Jul 23, 2024, at 13:46, Michal Meloun <meloun.michal@gmail.com>
> wrote:
> >>>>>>>
> >>>>>>> On 23.07.2024 11:36, Konstantin Belousov wrote:
> >>>>>>>> On Tue, Jul 23, 2024 at 09:53:41AM +0200, Michal Meloun wrote:
> >>>>>>>>> The good news is that I'm finally able to generate a
> working/locking
> >>>>>>>>> test case.  The culprit (at least for me) is if "-mcpu" is used
> when
> >>>>>>>>> compiling libthr (e.g. indirectly injected via CPUTYPE in
> /etc/make.conf).
> >>>>>>>>> If it is not used, libthr is broken (regardless of -O level or
> debug/normal
> >>>>>>>>> build), but -mcpu=cortex-a15 will always produce a working
> libthr.
> >>>>>>>> I think this is very significant progress.
> >>>>>>>> Do you plan to drill down more to see what is going on?
> >>>>>>>
> >>>>>>> So the problem is now clear, and I fear it may apply to other
> architectures as well.
> >>>>>>> dlopen_object() (from rtld_elf),
> >>>>>>> https://cgit.freebsd.org/src/tree/libexec/rtld-elf/rtld.c#n3766,
> >>>>>>> holds the rtld_bind_lock write lock for almost the entire time a
> new library is loaded.
> >>>>>>> If the code uses a yet unresolved symbol to load the library, the
> rtl_bind() function attempts to get read lock of  rtld_bind_lock and a
> deadlock occurs.
> >>>>>>>
> >>>>>>> In this case, it round_up() in _thr_stack_fix_protection,
> >>>>>>>
> https://cgit.freebsd.org/src/tree/lib/libthr/thread/thr_stack.c#n136.
> >>>>>>> Issued by __aeabi_uidiv (since not all armv7 processors support HW
> divide).
> >>>>>>>
> >>>>>>> Unfortunately, I'm not sure how to fix it.  The compiler can emit
> __aeabi_<> in any place, and I'm not sure if it can resolve all the symbols
> used by rtld_eld and libthr beforehand.
> >>>>>>>
> >>>>>>>
> >>>>>>> Michal
> >>>>>>>
> >>>>>>
> >>>>>> In this case (but not for all _aeabi_ functions) we can avoid
> division
> >>>>>> as long as page size is a power of 2.
> >>>>>>
> >>>>>> The function is
> >>>>>>
> >>>>>>    static inline size_t
> >>>>>>    round_up(size_t size)
> >>>>>>    {
> >>>>>>     if (size % _thr_page_size != 0)
> >>>>>>     size = ((size / _thr_page_size) + 1) *
> >>>>>>         _thr_page_size;
> >>>>>>     return size;
> >>>>>>    }
> >>>>>>
> >>>>>> The body can be condensed to
> >>>>>>
> >>>>>>    return (size + _thr_page_size - 1) & ~(_thr_page_size - 1);
> >>>>>>
> >>>>>> This is shorter in both lines of code and instruction bytes.
> >>>>>
> >>>>> Lets not allow this to be lost.  Could anybody confirm that the patch
> >>>>> below fixes the issue?
> >>>>>
> >>>>> commit d560f4f6690a48476565278fd07ca131bf4eeb3c
> >>>>> Author: Konstantin Belousov <kib@FreeBSD.org>
> >>>>> Date:   Wed Jul 24 13:17:55 2024 +0300
> >>>>>
> >>>>>      rtld: avoid division in __thr_map_stacks_exec()
> >>>>>      The function is called by rtld with the rtld bind lock
> write-locked,
> >>>>>      when fixing the stack permission during dso load.  Not every
> ARMv7 CPU
> >>>>>      supports the div, which causes the recursive entry into rtld to
> resolve
> >>>>>      the  __aeabi_uidiv symbol, causing self-lock.
> >>>>>      Workaround the problem by using roundup2() instead of
> open-coding less
> >>>>>      efficient formula.
> >>>>>      Diagnosed by:   mmel
> >>>>>      Based on submission by: John F Carr <jfc@mit.edu>
> >>>>>      Sponsored by:   The FreeBSD Foundation
> >>>>>      MFC after:      1 week
> >>>>>
> >>> Just realized that it is wrong.  Stack size is user-controlled and it
> does
> >>> not need to be power of two.
> >>
> >> Your change is correct.  _thr_page_size is set to getpagesize(),
> >> which is a power of 2.   The call to roundup2 takes a user-provided
> >> size and rounds it up to a multiple of the system page size.
> >>
> >> I tested the change and it works.  My change also works and
> >> should compile to identical code.  I forgot there was a standard
> >> function to do the rounding.
> > Right, my bad, thank you for correcting my thinko.
> >
> >>
> >>> For final resolving of deadlocks, after a full day of digging, I'm
> very much
> >>>> incline  of adding -znow to the linker flags for libthr.so (and maybe
> also
> >>>> for ld-elf.so). The runtime cost of resolving all symbols at startup
> is very
> >>>> low. Direct pre-solving in _thr_rtld_init() is problematic for the
> _aeabi_*
> >>>> symbols, since they don't have an official C prototypes, and some are
> not
> >>>> compatible with C calling conventions.
> >>> I do not like it. `-z now' changes (breaks) the ABI and makes some
> symbols
> >>> not preemtible.
> >>>
> >>> In the worst case, we would need a call to the asm routine which
> causes the
> >>> resolution of the _eabi_* symbols on arm.
> >>>
> >>
> >> It would also be possible to link libthr with libgcc.a and use a linker
> map
> >> to hide the _eabi_ symbols.
> > In principle yes, but if the ARM ABI states that _eabi symbols must be
> used,
> > and exported from libc, then this is also some form of ABI breakage.
>
> I hope that https://reviews.freebsd.org/D46104 is acceptable :)
>

Can't speak for kib, but it looks good to my eye (though I agree with his
naming
quibble). And helps avoid -znow, though I could have gone either way on
that.
It's also simple enough not to be a burden.

Warner