From nobody Tue Jul 23 03:49:08 2024 X-Original-To: freebsd-arm@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4WSjnd107wz5R6fX for ; Tue, 23 Jul 2024 03:49:25 +0000 (UTC) (envelope-from marklmi@yahoo.com) Received: from sonic314-21.consmr.mail.gq1.yahoo.com (sonic314-21.consmr.mail.gq1.yahoo.com [98.137.69.84]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4WSjnb0MQHz4LCF for ; Tue, 23 Jul 2024 03:49:22 +0000 (UTC) (envelope-from marklmi@yahoo.com) Authentication-Results: mx1.freebsd.org; dkim=pass header.d=yahoo.com header.s=s2048 header.b=haGeNt72; dmarc=pass (policy=reject) header.from=yahoo.com; spf=pass (mx1.freebsd.org: domain of marklmi@yahoo.com designates 98.137.69.84 as permitted sender) smtp.mailfrom=marklmi@yahoo.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1721706560; bh=HAy4FU2B/7uySxnzLucWmPj6GrD/wz/Oh1ZqHzvCN+E=; h=Subject:From:In-Reply-To:Date:Cc:References:To:From:Subject:Reply-To; b=haGeNt72a1wIIRDlasZ4uI6HuBgltKwe7IjnqqTFxuUOo4vK4coMxqIyiHV40Ye5i4leuPsbdbIB0RXEkpJZv/wubXJ6cTdCo1sbu2rE5FtM9paHAYg6Gpr7aSGhws+dmzUHNi68orf93q9mnXGkT/XCdoqDVQwKz+5zzCj3PTY3u0sECsWO94Zc0nCJve1wdDnl/N+o+nWrJ/VGnZfJVd0IhJeEnq4623a916AC6HU1bLdWdPKnCHzP0VaQiiF+QT/hppwzqCd+btjkzS+vkFBSaIE96qlpJoFwoWuEbLn25mpy2/gg8C9pdbrMSNHLtl/NYFwaeXtRNQwMJTDsww== X-SONIC-DKIM-SIGN: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1721706560; bh=BMiSQHQlek8t48k+bJpBLF2666L9yCDiW+ZFJyc7X9i=; h=X-Sonic-MF:Subject:From:Date:To:From:Subject; b=R0EF2noxQQ6+OngZBI9qV5iJfuh+Vk5kafo3AcyWALtH52msxL+R+FWPhIv/B938JKk0U6TN0fSzZ5A7Bj+d4wM7g/jaS54mNa9omHvdtDC3tkLe4p9S51GJGdwQVZD2968GekUV8aQ20NGfYYcGfzR7s7nr2gDsy5eSIPRhB01tmEJ1aPnRkvmMG67PL2JvOYTRCHPnEL7n1JSn8f7a8MFkj5tHXOUe+EOlfUNQaggtc1XsMmtLOJD6hpp3gkTRcdyadQQa0gKRFXMVCGU198jAwBaGl7aOPAUsv5PVHs27ZggydHVEs1GxWQs+ATM29fiy/gW4hClRMZeo2ypKLw== X-YMail-OSG: hyLlAPIVM1mKpUFXrea4_J5FqAJIcZ0_5SxdYokGXThSDaFbbuBOoufvQcir.5S YxprKGkt_xKGBK30f2n1G3GsT.vf4NSyxymZav8UtY019K8vZAtkgzCHB5sCd56_4wPaU827qvuR 6Iy0QwBLu09N1emsKv3XYP3TfAj_xrI54HYuSE9sXHkJ2sygpK55UxpwbysACWR4t6Qo13Q9r6nO urdJtFa3uznH1NzetMczF6QjtMz.1wxotCjLwb4w9_FrQgp4Ax1A_OIMf.aemtcJy.AgUP75hhOV 8RNdLPAb8WFDrA6Fspi8pYJ3dAFUFUqZKq4jzI_OGUQHZmj8GvpvAsHYhekAi1G.n1TR6riySg4u BypP7PhT9E8UIr3mQOswu9SYZZQetvIoeM72fL4hiWYf5p6HaITl8H35QJ_NqO35L90l5pC1NBzt bmwRDviauO6NLOf1Lv7a63QNP4CpFYcGYRjMGXGPIts04h_NqtnJWR4xFjRQ2xO3exZkSXiq9X8F OAuoBmE62lhljn.gdE4usH_6ZyItcIVYbDSnltI_VM5sVxeNUw0A0LBvdAWqtAgX_bckJpqO_DTB 19xLZLkjKK4_uGFETlvEh0lxGiJ9Y8Dt9HJD5fHlmGNgLZIIWkrsGtuM9c7ZowZ8gsgVxaYTZ4pk VM5f.vtUheYd_ZRnbyqZURPlcj1eMO1.Kn8cstOMEhLYI9qDWi4jZVE_zorIMIbKR4JmjztLHTYR bmFOOrDLJJOxI.y4.LKZf1PmK3Cfb9IQ4iV62VXBXo.mxYAyM9WAs23fUu.pFZ5lnPZTzq.WgHCy ahyFl8vm0ES7HFbp.iY9ZI3bARAb6G32ZVq8_3XnDZdx8GRR2qI3YIm0EN4YTuCP41O8A96.bYQ1 ELVzCBMICwKoCgpfl2Sf2HrC2Hu0OtlYFxKoJlJbtuFcq8lsr2sQBWix.ezOixjyYCnAkcXksgjW Ep2kZr00cM7exOFIYk6itIoeg0Y573DbrsA9VKxnPEvbvF2aPZ83cSzGQu.eHLxfztAAqiWuiDZM 1V12NLubuRK5GcVfW_3SywvlkOzOEIkGA1dZ1boYN12w6MToNQTE33AHjgp8a8mGeojhnHHu3H61 P6ZMhcDv1lRi9CR9TgWQcik4QjqjbFa81HHmjc9rJLOMN5NUPOWraVitwx50NJ8uyMQwfZW4eaeS yUJMmfd7expo8ELKYwuk_9Re5yHu51fylLms3Z5lKYt853dJh5dlQiNtrY09egCfcFoeP0j2KCtT yUwKPQZD3Otrfyf3dHNAPeoYn_ZFLtwlh8nsw6xcGro2FQbbquHJ5KTFwrX.Ox9RpFA69_UPn_GO T3mxtioWRq.nAz3rGlTz.wWVpOO1ZxQOMPwtFvk2OWpUKOLVM9UTu.t7W84DUk7u20.WtIjzILYe A6gAPN4Tfxqa.DA.K9TCbVmTKtJL5oUFIuxVhsQ40KxpNv18aR1iQIBVjKnkbV5u9Xb9suDuZlJf IjQBlLCIlHYKrz58mBaYVxhaB9X98tNA8yhn33AT1nj_wihjmG4Qcxci6JD0Jl0qLmwYu6dpF4y2 ICf2G3mVProsoAlAajD8s68GN0koOAVsjVkOizgMqyzsBdtFNf.0c1fqglwW7.vsbPp5V50iUHkd gbUqxTjbu82TPKRa_qARksr7lAV7Ej2G2K1GBdfuP_d0uLwUL18CKp9mRsyCyAoPUGG0MAZN8_Re N2Us4XvlEEsHUjVOVeg_E1AeowAShvUb43MKCl1kGkJi7uJ_5gzUUzkfXUYxsP1RKD2MyzKs.gjd vYlDiNugCc4HLXcm8qqpwpuFR0uklkDd3_hT.av6zoKy5jgU6c2axsExMPpEINCAsztzJnd3C6J1 3XlS16A8s4.iwCvcWKzt.YG1kcbWrR051C1YmA2zbfJ9mwsAIIxiovT2SFMzh9aqZ6IHRt60X2Cb c_3QsroGKtrHApdKpP_qHJSddkYK3n_NOMn7XqHcCGWJRp3_rJcUnQrC7KF_.XcEtLhpXQWgQGg3 M5rhpXFwOxKAYsGR.eEWeJOe5bmlDoAPsR1.8omZ0OqE3cCYMonlvk1ta23BsG2Ic0u7x38b5QIm ExMmoGfaniQzycs0DT7t3CUTllIpf8xXtZnMylDBmowNhgOxs3Yiq8syquCsNAai5_NaMJDsP5Uv veNRkaDxrx_TSx46AnKr8A159mWxYhIHpiNeooixX4JuJN6SjwCqnPmx5WVpabSymtweNye5OV4l 59eUVs.ksqqgUndjza4KUgETGFkaDmYwkqyZuux5vnZldGoDXTj45tQUTzkZzVcggb0gzr4Vmq01 d X-Sonic-MF: X-Sonic-ID: 90a01516-1769-4e23-b980-4b576a49c66b Received: from sonic.gate.mail.ne1.yahoo.com by sonic314.consmr.mail.gq1.yahoo.com with HTTP; Tue, 23 Jul 2024 03:49:20 +0000 Received: by hermes--production-gq1-799bb7c8cf-cvhk6 (Yahoo Inc. Hermes SMTP Server) with ESMTPA ID b3f95e0e59f30c50daea71cc2fe4aca4; Tue, 23 Jul 2024 03:49:19 +0000 (UTC) Content-Type: text/plain; charset=us-ascii List-Id: Porting FreeBSD to ARM processors List-Archive: https://lists.freebsd.org/archives/freebsd-arm List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-arm@FreeBSD.org Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3774.600.62\)) Subject: Re: armv7-on-aarch64 stuck at urdlck From: Mark Millard In-Reply-To: <33251aa3-681f-4d17-afe9-953490afeaf0@gmail.com> Date: Mon, 22 Jul 2024 20:49:08 -0700 Cc: mmel@freebsd.org, FreeBSD Current , "freebsd-arm@freebsd.org" , "kib@freebsd.org >> Konstantin Belousov" Content-Transfer-Encoding: quoted-printable Message-Id: <0DD19771-3AAB-469E-981B-1203F1C28233@yahoo.com> References: <724db42b-5550-4381-8277-2971e6b3e8f1@freebsd.org> <86185657-e521-466b-89e2-f291aaac10a6@freebsd.org> <0EF18174-8735-46A4-BD71-FFA3472B319F@yahoo.com> <33251aa3-681f-4d17-afe9-953490afeaf0@gmail.com> To: Michal Meloun X-Mailer: Apple Mail (2.3774.600.62) X-Spamd-Bar: --- X-Spamd-Result: default: False [-3.73 / 15.00]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_SHORT(-0.73)[-0.734]; DMARC_POLICY_ALLOW(-0.50)[yahoo.com,reject]; R_SPF_ALLOW(-0.20)[+ptr:yahoo.com]; R_DKIM_ALLOW(-0.20)[yahoo.com:s=s2048]; MIME_GOOD(-0.10)[text/plain]; FREEMAIL_TO(0.00)[gmail.com]; RCVD_TLS_LAST(0.00)[]; TO_DN_EQ_ADDR_SOME(0.00)[]; ARC_NA(0.00)[]; TO_DN_SOME(0.00)[]; FREEMAIL_ENVFROM(0.00)[yahoo.com]; MIME_TRACE(0.00)[0:+]; DWL_DNSWL_NONE(0.00)[yahoo.com:dkim]; FREEMAIL_FROM(0.00)[yahoo.com]; FROM_HAS_DN(0.00)[]; ASN(0.00)[asn:36647, ipnet:98.137.64.0/20, country:US]; RCVD_IN_DNSWL_NONE(0.00)[98.137.69.84:from]; TO_MATCH_ENVRCPT_SOME(0.00)[]; MLMMJ_DEST(0.00)[freebsd-arm@freebsd.org]; RCVD_COUNT_TWO(0.00)[2]; FROM_EQ_ENVFROM(0.00)[]; DKIM_TRACE(0.00)[yahoo.com:+]; MID_RHS_MATCH_FROM(0.00)[]; TAGGED_RCPT(0.00)[]; APPLE_MAILER_COMMON(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; RWL_MAILSPIKE_POSSIBLE(0.00)[98.137.69.84:from]; RCPT_COUNT_FIVE(0.00)[5] X-Rspamd-Queue-Id: 4WSjnb0MQHz4LCF On Jul 22, 2024, at 12:36, Michal Meloun = wrote: > On 22. 7. 2024 19:27, Mark Millard wrote: >> On Jul 22, 2024, at 09:41, meloun.michal@gmail.com wrote: >>=20 >>=20 >>> On 22.07.2024 18:26, Mark Millard wrote: >>>=20 >>>> On Jul 22, 2024, at 06:40, Michal Meloun = wrote: >>>>=20 >>>>> On 22.07.2024 13:46, Mark Millard wrote: >>>>>=20 >>>>>> On Jul 21, 2024, at 22:59, Michal Meloun = wrote: >>>>>>=20 >>>>>>> I don't want to hijack the original thread, so I'm replying in a = new one. >>>>>>>=20 >>>>>>> My tegra track current, has been running 24/7 by building = kernel/world and kde5 in a loop for a few years now. But I have never = encountered the aforementioned lockup in native armv7. >>>>>>>=20 >>>>>>> I have seen usermode mutex lockup in arm32 jail on aarch64, but = only very rarely (once a month or so) and all my attempts to reproduce = it in a more deterministic way have failed. Also, I don't think I've = ever seen this with the debug version of libc. >>>>>>>=20 >>>>>>> Unfortunately I also failed to reproduce given lockup using = dlopen_test.c, neither on native armv7 or arm32 jail. >>>>>>>=20 >>>>>>> Michal Meloun >>>>>>>=20 >>>>>> What is the output of: >>>>>> # readelf -a /libexec/ld-elf.so.1 | grep -E "(^[^ = 0-9]|.*_rtld_get_stack_prot)" >>>>>> in your armv7 context(s)? Does it include for likes of: >>>>>> QUOTE >>>>>> Symbol table '.symtab' contains 911 entries: >>>>>> 903: 000000000001b9ac 16 FUNC GLOBAL DEFAULT 11 = _rtld_get_stack_prot >>>>>> END QUOTE >>>>>> ` >>>>>> vs. not? >>>>>> Note that the "debug version of libc" being involved likely means = that >>>>>> DEBUG_FLAGS was defined. That in turn likely means that strip is = not >>>>>> being used. In such a case, I expect that the .symtab entry for >>>>>> _rtld_get_stack_prot (and more) exists for such a context. >>>>>>=20 >>>>> At tis time, I have standard (thus stripped, non-debug) version of = runtime linker library installed. Thus it have only dynamic relocation = record for _rtld_get_stack_prot: >>>>>=20 >>>>> root@tegra124:~/dlopen_test # readelf -a /libexec/ld-elf.so.1 | = grep -E "(^[^ 0-9]|.*_rtld_get_stack_prot)" >>>>> ELF Header: >>>>> Elf file type is DYN (Shared object file) >>>>> Entry point 0x1449c >>>>> There are 10 program headers, starting at offset 52 >>>>> Program Headers: >>>>> There are 23 section headers, starting at offset 0x1a448: >>>>> Section Headers: >>>>> Key to Flags: >>>>> Dynamic section at offset 0x19fa4 contains 15 entries: >>>>> Relocation section (.rel.dyn): >>>>> r_offset r_info r_type st_value st_name >>>>> Symbol table '.dynsym' contains 27 entries: >>>>> 5: 000000000001ba0c 16 FUNC GLOBAL DEFAULT 12 = _rtld_get_stack_prot@@FBSDprivate_1.0 (11) >>>>> Notes at offset 0x00000174 with length 0x00000018: >>>>> Histogram for bucket list length (total of 6 buckets): >>>>> Histogram for bucket list length (total of 27 buckets): >>>>> Version symbol section (.gnu.version): >>>>> Version definition section (.gnu.version_d): >>>>> Attribute Section: aeabi >>>>>=20 >>>>> ------ >>>>>=20 >>>>> root@tegra124:~/dlopen_test # ./dlopen_test >>>>> root@tegra124:~/dlopen_test # >>>>>=20 >>>> Just to be sure . . . >>>> Did you at some point "pkg install cairo" (or analogous) so that >>>> the following (or some vintage) were in place? >>>> # ls -lodT /usr/local/lib/libcairo.so* >>>> lrwxr-xr-x 1 root wheel - 21 Apr 29 19:45:15 2024 = /usr/local/lib/libcairo.so -> libcairo.so.2.11704.0 >>>> lrwxr-xr-x 1 root wheel - 21 Apr 29 19:45:15 2024 = /usr/local/lib/libcairo.so.2 -> libcairo.so.2.11704.0 >>>> -rwxr-xr-x 1 root wheel - 1118272 Apr 29 19:45:15 2024 = /usr/local/lib/libcairo.so.2.11704.0 >>>> # file /usr/local/lib/libcairo.so.2.11704.0 >>>> /usr/local/lib/libcairo.so.2.11704.0: ELF 32-bit LSB shared object, = ARM, EABI5 version 1 (FreeBSD), dynamically linked, for FreeBSD 15.0 = (1500018), stripped >>>> (Installing cairo would also install other things it needs.) >>>> For the failing contexts, the a.out from dlopen_test.c will only >>>> hang if the library (and what it requires) is actually there to >>>> load. >>>>=20 >>> Yep, i have cairo installed (but compiled from sources, not = installed by pkg). And i have verified that dlopen() return success. >>> In the meantime I tried all combinations (debud/stripped) of ld_elf = and libthr. All combinations work without problems on the native system = and in arm323 jail. >>>=20 >> Thanks for the information. My personal builds, which are the >> ones that work in my testing, are built on aarch64 as armv7 >> instead of on amd64. The known failing ones are built on amd64. >> But I've no more specific information suggesting a tie to the >> type of build host for the world used. >>=20 >>=20 >>> Btw, gdb has long had problems with stepping inside ld_elf. It's = better to run the test program without it and connect to the test = program to get the "correct" stack trace. >>>=20 >>>=20 >> In part I was deliberately exploring what sequence leads to the >> hangups vs. lack of hangups and the like: more context than a >> backtrace of the stuck state can provide. >>=20 >> But doing "./a.out &" and then "gdb -p..." to attach to it: >>=20 >> _umtx_op () at _umtx_op.S:4 >>=20 >> warning: 4 _umtx_op.S: No such file or directory >> (gdb) bt >> #0 _umtx_op () at _umtx_op.S:4 >> #1 0x2036845c in _umtx_op_err (obj=3D0x4, op=3D12, val=3D0, = uaddr=3D0x0, uaddr2=3D0x0) at = /home/pkgbuild/worktrees/main/lib/libsys/_umtx_op_err.c:36 >> #2 0x20115da8 in __thr_rwlock_rdlock (rwlock=3D0x4, = rwlock@entry=3D0x20137c40, flags=3D3, tsp=3D) at = /home/pkgbuild/worktrees/main/lib/libthr/thread/thr_umtx.c:294 >> #3 0x2010ebf4 in _thr_rwlock_rdlock (rwlock=3D0x20137c40, flags=3D0, = tsp=3D0x0) at = /home/pkgbuild/worktrees/main/lib/libthr/thread/thr_umtx.h:229 >> #4 _thr_rtld_rlock_acquire (lock=3D0x20137c40) at = /home/pkgbuild/worktrees/main/lib/libthr/thread/thr_rtld.c:121 >> #5 0x20060788 in rlock_acquire (lock=3D0x2008af10 , = lockstate=3Dlockstate@entry=3D0xffffd114) at = /home/pkgbuild/worktrees/main/libexec/rtld-elf/rtld_lock.c:259 >> #6 0x20059098 in _rtld_bind (obj=3D0x2008f404, reloff=3D496) at = /home/pkgbuild/worktrees/main/libexec/rtld-elf/rtld.c:1035 >> #7 0x2005483c in _rtld_bind_start () at = /home/pkgbuild/worktrees/main/libexec/rtld-elf/arm/rtld_start.S:89 >> #8 0x2005483c in _rtld_bind_start () at = /home/pkgbuild/worktrees/main/libexec/rtld-elf/arm/rtld_start.S:89 >> #9 0x2005483c in _rtld_bind_start () at = /home/pkgbuild/worktrees/main/libexec/rtld-elf/arm/rtld_start.S:89 >> . . . >>=20 >> It does not seem significantly different than I'd reported >> for the hungup state. >>=20 >> An issue here is that the pkgbase world possibly is -O2 based >> despite having debug information (but is stripped). This can >> make details less reliable. So, for example, the rwlock=3D0x4 >> vs. rwlock@entry=3D0x20137c40 for __thr_rwlock_rdlock could well >> be suspect. >>=20 >>=20 >=20 > IMHO, -O2 shouldn't be able to modify function arguments for public = functions, so this memory corruption fits perfectly with the = observed behavior. It is not a memory corruption. r0 is "argument 1/scratch = register/result" and the code in question in my example is (__thr_rwlock_rdlock via disass /s = use): 280 { 0x20115d50 <+0>: push {r11, lr} 0x20115d54 <+4>: mov r11, sp 0x20115d58 <+8>: sub sp, sp, #32 0x20115d5c <+12>: mov r12, r1 . . . 291 tm_p =3D &timeout; 292 tm_size =3D sizeof(timeout); 293 } 294 return (_umtx_op_err(rwlock, UMTX_OP_RW_RDLOCK, flags, 0x20115d98 <+72>: str r1, [sp] 0x20115d9c <+76>: mov r1, #12 0x20115da0 <+80>: mov r2, r12 0x20115da4 <+84>: bl 0x201167a0 =3D> 0x20115da8 <+88>: mov sp, r11 0x20115dac <+92>: pop {r11, pc} After the "bl 0x201167a0" the value of r0 is the return value from 0x201167a0, not the first argument value for 0x20115d50 . A better reporting would indicate that rwlock was at that point: locally the value has not been preserved at that point because there is no more use of the value. But such is the kind of thing I expect to run into for the likes of -O2 use with debug information. Anyway, _umtx_op_err returned the 0x4 value that is shown for rwlock . > But , out of curiosity, a quick look at _thr_rwlock_tryrdlock() in = thr_umtx.h:208 makes me wonder: How is the "state" variable inside the = loop guaranteed to be updated? IMHO nothing inside the loop emits a = global memory modification attribute, so the compiler is free to move = the assignment to a "state" variable outside the loop.=20 > Kib, please, do you have any comment on this?=20 > MIchal Meloun =3D=3D=3D Mark Millard marklmi at yahoo.com