From nobody Mon Dec 27 15:13:50 2021 X-Original-To: freebsd-hackers@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id D9A7419162A8 for ; Mon, 27 Dec 2021 15:13:58 +0000 (UTC) (envelope-from janm@transactionware.com) Received: from mail3.transactionware.com (mail.transactionware.com [203.14.245.7]) by mx1.freebsd.org (Postfix) with SMTP id 4JN1Qt1LFnz4gkd for ; Mon, 27 Dec 2021 15:13:57 +0000 (UTC) (envelope-from janm@transactionware.com) Received: (qmail 39809 invoked by uid 907); 27 Dec 2021 15:13:56 -0000 Received: from i5E86400D.versanet.de (HELO smtpclient.apple) (94.134.64.13) (smtp-auth username janm, mechanism plain) by mail3.transactionware.com (qpsmtpd/0.84) with (ECDHE-RSA-AES256-GCM-SHA384 encrypted) ESMTPSA; Tue, 28 Dec 2021 02:13:56 +1100 Content-Type: text/plain; charset=utf-8 List-Id: Technical discussions relating to FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-hackers List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-hackers@freebsd.org Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.120.0.1.13\)) Subject: Re: closefrom blocking, wchan urdlck From: Jan Mikkelsen In-Reply-To: Date: Mon, 27 Dec 2021 16:13:50 +0100 Cc: freebsd-hackers@freebsd.org Content-Transfer-Encoding: quoted-printable Message-Id: <9CB0803A-E15B-47F9-97A9-03597D41C01E@transactionware.com> References: <2B3BA665-D42A-4B5F-AD2F-ED10E64A7276@transactionware.com> To: Konstantin Belousov X-Mailer: Apple Mail (2.3654.120.0.1.13) X-Rspamd-Queue-Id: 4JN1Qt1LFnz4gkd X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[] X-ThisMailContainsUnwantedMimeParts: N > On 27 Dec 2021, at 16:03, Konstantin Belousov = wrote: >=20 > On Mon, Dec 27, 2021 at 03:54:57PM +0100, Jan Mikkelsen wrote: >>=20 >>> On 27 Dec 2021, at 14:52, Konstantin Belousov = wrote: >>>=20 >>> On Mon, Dec 27, 2021 at 01:39:11PM +0100, Jan Mikkelsen wrote: >>>> Hi, >>>>=20 >>>> (On 11.2) >>>>=20 >>>> I am occasionally seeing closefrom() block in a child process = created by a call to pdfork(). >>>>=20 >>>> When this does happen, it is very early after the process has = started, while other threads are being created elsewhere in the process. = I cannot reproduce it after the thread creation is complete. According = to the sigaction man page, this should be async signal safe. >>>>=20 >>>> Stack trace from the call to closefrom(): >>>>=20 >>>> * frame #0: 0x000000080090276c libthr.so.3`_umtx_op_err at = _umtx_op_err.S:37 >>>> frame #1: 0x00000008008f6121 = libthr.so.3`__thr_rwlock_rdlock(rwlock=3D, = flags=3D, tsp=3D) at thr_umtx.c:307:10 >>>> frame #2: 0x00000008008ff1ac libthr.so.3`_thr_rtld_rlock_acquire = [inlined] _thr_rwlock_rdlock(rwlock=3D0x0000000800911600, flags=3D0, = tsp=3D0x0000000000000000) at thr_umtx.h:232:10 >>>> frame #3: 0x00000008008ff19b = libthr.so.3`_thr_rtld_rlock_acquire(lock=3D0x0000000800911600) at = thr_rtld.c:125 >>>> frame #4: 0x000000080075332b = ld-elf.so.1`rlock_acquire(lock=3D0x0000000800765270, = lockstate=3D0x00007fffdfbfb8d0) at rtld_lock.c:208:2 >>>> frame #5: 0x000000080074ba20 = ld-elf.so.1`_rtld_bind(obj=3D0x0000000800769000, reloff=3D6072) at = rtld.c:861:5 >>>> frame #6: 0x0000000800747c7d ld-elf.so.1`_rtld_bind_start at = rtld_start.S:121 >>>> frame #7: 0x00000000006562d3 = prog`Twio::ProcHandle::spawn(this=3D, command=3D"/bin/echo", = args=3D0x0000000800d7e000, descriptor_mapping=3D, = descriptor_end=3D3) at prochandle_pdfork.cpp:308:2 >>> And where is the closefrom() call in the demonstrated trace? >>>=20 >>> What version of the system do you use? >>> You need at least cbdec8db18b533f6d7be (on HEAD) or = a5659943e37a74c96e >>> (stable/13) for pdfork() to behave sanely. But you still not = allowed to >>> call non-async signal safe functions in the child before exec. >>=20 >>=20 >> This is 12.2-p11. I just noticed that I wrote 11.2 above, that is = incorrect. >>=20 >> Frame 7 is a call to closefrom(). The child process calls dup2(), = closefrom(), signal() and then execv(). No other calls are made, and I = believe closefrom() is meant to be async signal safe. >>=20 > Frame 7 cannot be a call to closefrom(), it would be resolved to = closefrom() > symbol would it be. =46rom lldb, attached to the hung process: (lldb)=20 frame #6: 0x0000000800748c7d ld-elf.so.1`_rtld_bind_start at = rtld_start.S:121 118 leaq (%rsi,%rsi,2),%rsi # multiply by 3 119 leaq (,%rsi,8),%rsi # now 8, for 24 (sizeof = Elf_Rela) 120 =09 -> 121 call _rtld_bind # Transfer control to = the binder 122 /* Now %rax contains the entry point of the function = being called. */ 123 =09 124 movq %rax,0x60(%rsp) # Store target over = reloff argument (lldb)=20 frame #7: 0x0000000000656813 = amt5-chefd`Twio::ProcHandle::spawn(this=3D, = command=3D"/bin/date", args=3D0x0000000800d5f010, = descriptor_mapping=3D, descriptor_end=3D3) at = prochandle_pdfork.cpp:304:2 301 _exit(127); 302 } 303 =09 -> 304 closefrom(descriptor_end); 305 =09 306 signal(SIGPIPE, SIG_DFL); 307 signal(SIGALRM, SIG_DFL); (lldb)=20 >> The commit you can apply cleanly to 12.2, I=E2=80=99m running a build = now. Are there other issues with pdfork in 12.2? >=20 > pdfork() with threading processes requires 21f749da82e755aafab1276 and = the > followup cbdec8db18b533f6d7be. I do not believe any of this is in = 12.3, > and definitely not in 12.2. Thanks, will check and apply. Regards, Jan M.