From nobody Tue Jul 23 20:54:46 2024 X-Original-To: freebsd-arm@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4WT8Xz0w0fz5Qv8C for ; Tue, 23 Jul 2024 20:54:59 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: from mail-pj1-x102f.google.com (mail-pj1-x102f.google.com [IPv6:2607:f8b0:4864:20::102f]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "WR4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4WT8Xy6JhWz4NZy for ; Tue, 23 Jul 2024 20:54:58 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Authentication-Results: mx1.freebsd.org; none Received: by mail-pj1-x102f.google.com with SMTP id 98e67ed59e1d1-2cb56c2c30eso168817a91.1 for ; Tue, 23 Jul 2024 13:54:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bsdimp-com.20230601.gappssmtp.com; s=20230601; t=1721768097; x=1722372897; darn=freebsd.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=Cz8R2UYvv77GONRqv65jCsmANNFDPXoWDq/mo087bpc=; b=OleAi1Xqcp9ifZPsOPEBnT5P/htJtoCBN6qlBaehgG2al71n0hBX8kjwgVAEUQ4yHc 39WtaWuCnd2ziCMoZ6P+dZsMDBVZZdSIoA/qIUn3cema57w7wQ7lEffcULQJqB08JXi8 fnFOjK2QK0bxatCS5VHbsHj7StY2p/X2XriBvRP1pDy2dzRl0hyPvNXClFNlPc9DqAIQ n5iLA6roIUnwiY6hGGWoPYK8ChuZzFb3K/ABfNKq34uCGA1zjVU+PiEbBp6JSonMVP/g KSvP6Jv/fzRp9tJuDCkfnXthf4s/Yh/iO/JmymYFsPtQhUE8OlcB4gqDmwseS3SBCesn XivQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721768097; x=1722372897; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=Cz8R2UYvv77GONRqv65jCsmANNFDPXoWDq/mo087bpc=; b=NOZp+ULAhHv2ArIkaw2gsOYEsG1+K0N4fPUKigok1CasXqtuJSoBlQ7s61ImIHGCR9 /2KxJhqls8PW0UpNq5XhFvZdOIINWl9K77m6BM+MdvWUx1SjSenQoFj8kBj5bIgU6gCr y8zSNlSgEsALN4Ym33KkSyuWaZj3t6xUb9WR5V8uJRw8ZFgxv6pq79d3wFA6zPTgyyFS VCB1fnL9alkZFaBrBON4Im6ZVW97VcN0ddCV0TnGj68vaI/qttJX7vh5IUysZ1BOZV6l e4Xr+QN3df+kwnNSEUWBhj2OfAPsiH0BkMc8CgECmL7BTNrrZ8NumcfQcdnL0gTJRI5Y JPSQ== X-Forwarded-Encrypted: i=1; AJvYcCXiY7IA235e0UbvRf1EAMmoCxqC0QeHcyUmIPFqkKanb/jbIFO7UnI/zsi1PayLp/q52CuBy30KloGwlNt/+c/e6u0ltVLxCg== X-Gm-Message-State: AOJu0YxinAMjilKeugJYaTUaqeYam3Rse/9lHxOfDNSiCWDlmBtjghsJ YnGa5SPMVsaKZg/BpfNU2JiEwuWXkae+8JIqY6cQK8xwUFwFi+kxOD8YNwAvVP1SRPtrM6dTiaw CMW6kEStlmSZqne1jHTkpXwFrZEF83dCuFDulgQ== X-Google-Smtp-Source: AGHT+IGxJ0MYUO89Zmlc7e2mvDzmOhAli8X8tRvzBI7E6CXSy4fHKxSbNCmbEp9Q+hTwctFmjWXWWD5h5R1rHuBNHC0= X-Received: by 2002:a17:90b:378c:b0:2c7:c5f5:1c72 with SMTP id 98e67ed59e1d1-2cd8ce63cf9mr5014765a91.13.1721768097254; Tue, 23 Jul 2024 13:54:57 -0700 (PDT) List-Id: Porting FreeBSD to ARM processors List-Archive: https://lists.freebsd.org/archives/freebsd-arm List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-arm@FreeBSD.org MIME-Version: 1.0 References: <724db42b-5550-4381-8277-2971e6b3e8f1@freebsd.org> <86185657-e521-466b-89e2-f291aaac10a6@freebsd.org> <0EF18174-8735-46A4-BD71-FFA3472B319F@yahoo.com> <33251aa3-681f-4d17-afe9-953490afeaf0@gmail.com> <0DD19771-3AAB-469E-981B-1203F1C28233@yahoo.com> <6a969609-fa0e-419d-83d5-e4fcf0f6ec35@freebsd.org> In-Reply-To: From: Warner Losh Date: Tue, 23 Jul 2024 14:54:46 -0600 Message-ID: Subject: Re: armv7-on-aarch64 stuck at urdlck To: John F Carr Cc: "mmel@freebsd.org" , Konstantin Belousov , Mark Millard , FreeBSD Current , "freebsd-arm@freebsd.org" Content-Type: multipart/alternative; boundary="000000000000d2acd8061df05f05" X-Spamd-Bar: ---- X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US] X-Rspamd-Queue-Id: 4WT8Xy6JhWz4NZy --000000000000d2acd8061df05f05 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Tue, Jul 23, 2024 at 2:11=E2=80=AFPM John F Carr wrote: > On Jul 23, 2024, at 13:46, Michal Meloun wrote: > > > > On 23.07.2024 11:36, Konstantin Belousov wrote: > >> On Tue, Jul 23, 2024 at 09:53:41AM +0200, Michal Meloun wrote: > >>> The good news is that I'm finally able to generate a working/locking > >>> test case. The culprit (at least for me) is if "-mcpu" is used when > >>> compiling libthr (e.g. indirectly injected via CPUTYPE in > /etc/make.conf). > >>> If it is not used, libthr is broken (regardless of -O level or > debug/normal > >>> build), but -mcpu=3Dcortex-a15 will always produce a working libthr. > >> I think this is very significant progress. > >> Do you plan to drill down more to see what is going on? > > > > So the problem is now clear, and I fear it may apply to other > architectures as well. > > dlopen_object() (from rtld_elf), > > https://cgit.freebsd.org/src/tree/libexec/rtld-elf/rtld.c#n3766, > > holds the rtld_bind_lock write lock for almost the entire time a new > library is loaded. > > If the code uses a yet unresolved symbol to load the library, the > rtl_bind() function attempts to get read lock of rtld_bind_lock and a > deadlock occurs. > > > > In this case, it round_up() in _thr_stack_fix_protection, > > https://cgit.freebsd.org/src/tree/lib/libthr/thread/thr_stack.c#n136. > > Issued by __aeabi_uidiv (since not all armv7 processors support HW > divide). > > > > Unfortunately, I'm not sure how to fix it. The compiler can emit > __aeabi_<> in any place, and I'm not sure if it can resolve all the symbo= ls > used by rtld_eld and libthr beforehand. > > > > > > Michal > > > > In this case (but not for all _aeabi_ functions) we can avoid division > as long as page size is a power of 2. > > The function is > > static inline size_t > round_up(size_t size) > { > if (size % _thr_page_size !=3D 0) > size =3D ((size / _thr_page_size) + 1) * > _thr_page_size; > return size; > } > > The body can be condensed to > > return (size + _thr_page_size - 1) & ~(_thr_page_size - 1); > > This is shorter in both lines of code and instruction bytes. > I like this change... But we do need to fix the deadlocks... They seem to be more likely when building in bsd-user emulation... Warner --000000000000d2acd8061df05f05 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


=
On Tue, Jul 23, 2024 at 2:11=E2=80=AF= PM John F Carr <jfc@mit.edu> wrote= :
On Jul 23, 202= 4, at 13:46, Michal Meloun <meloun.michal@gmail.com> wrote:
>
> On 23.07.2024 11:36, Konstantin Belousov wrote:
>> On Tue, Jul 23, 2024 at 09:53:41AM +0200, Michal Meloun wrote:
>>> The good news is that I'm finally able to generate a worki= ng/locking
>>> test case.=C2=A0 The culprit (at least for me) is if "-mc= pu" is used when
>>> compiling libthr (e.g. indirectly injected via CPUTYPE in /etc= /make.conf).
>>> If it is not used, libthr is broken (regardless of -O level or= debug/normal
>>> build), but -mcpu=3Dcortex-a15 will always produce a working l= ibthr.
>> I think this is very significant progress.
>> Do you plan to drill down more to see what is going on?
>
> So the problem is now clear, and I fear it may apply to other architec= tures as well.
> dlopen_object() (from rtld_elf),
> https://cgit.freebsd.org/src/tre= e/libexec/rtld-elf/rtld.c#n3766,
> holds the rtld_bind_lock write lock for almost the entire time a new l= ibrary is loaded.
> If the code uses a yet unresolved symbol to load the library, the rtl_= bind() function attempts to get read lock of=C2=A0 rtld_bind_lock and a dea= dlock occurs.
>
> In this case, it round_up() in _thr_stack_fix_protection,
> https://cgit.freebsd.org/sr= c/tree/lib/libthr/thread/thr_stack.c#n136.
> Issued by __aeabi_uidiv (since not all armv7 processors support HW div= ide).
>
> Unfortunately, I'm not sure how to fix it.=C2=A0 The compiler can = emit __aeabi_<> in any place, and I'm not sure if it can resolve = all the symbols used by rtld_eld and libthr beforehand.
>
>
> Michal
>

In this case (but not for all _aeabi_ functions) we can avoid division
as long as page size is a power of 2.

The function is

=C2=A0 static inline size_t
=C2=A0 round_up(size_t size)
=C2=A0 {
=C2=A0 =C2=A0 =C2=A0 =C2=A0 if (size % _thr_page_size !=3D 0)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 size =3D ((size / _= thr_page_size) + 1) *
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 _thr_= page_size;
=C2=A0 =C2=A0 =C2=A0 =C2=A0 return size;
=C2=A0 }

The body can be condensed to

=C2=A0 return (size + _thr_page_size - 1) & ~(_thr_page_size - 1);

This is shorter in both lines of code and instruction bytes.

I like this change...

But we= do need to fix the deadlocks... They seem to be more likely
when= building in bsd-user emulation...

Warner=C2=A0
--000000000000d2acd8061df05f05--