From nobody Mon Dec 27 15:19:16 2021 X-Original-To: freebsd-hackers@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 977721917B1F for ; Mon, 27 Dec 2021 15:19:23 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4JN1Y72Llpz4j50 for ; Mon, 27 Dec 2021 15:19:23 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kib@localhost [127.0.0.1]) by kib.kiev.ua (8.16.1/8.16.1) with ESMTPS id 1BRFJGg7036509 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO); Mon, 27 Dec 2021 17:19:19 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua 1BRFJGg7036509 Received: (from kostik@localhost) by tom.home (8.16.1/8.16.1/Submit) id 1BRFJGxp036508; Mon, 27 Dec 2021 17:19:16 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Mon, 27 Dec 2021 17:19:16 +0200 From: Konstantin Belousov To: Jan Mikkelsen Cc: freebsd-hackers@freebsd.org Subject: Re: closefrom blocking, wchan urdlck Message-ID: References: <2B3BA665-D42A-4B5F-AD2F-ED10E64A7276@transactionware.com> <9CB0803A-E15B-47F9-97A9-03597D41C01E@transactionware.com> List-Id: Technical discussions relating to FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-hackers List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-hackers@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <9CB0803A-E15B-47F9-97A9-03597D41C01E@transactionware.com> X-Spam-Status: No, score=-1.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FORGED_GMAIL_RCVD,FREEMAIL_FROM, NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.5 X-Spam-Checker-Version: SpamAssassin 3.4.5 (2021-03-20) on tom.home X-Rspamd-Queue-Id: 4JN1Y72Llpz4j50 X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[] X-ThisMailContainsUnwantedMimeParts: N On Mon, Dec 27, 2021 at 04:13:50PM +0100, Jan Mikkelsen wrote: > > > > On 27 Dec 2021, at 16:03, Konstantin Belousov wrote: > > > > On Mon, Dec 27, 2021 at 03:54:57PM +0100, Jan Mikkelsen wrote: > >> > >>> On 27 Dec 2021, at 14:52, Konstantin Belousov wrote: > >>> > >>> On Mon, Dec 27, 2021 at 01:39:11PM +0100, Jan Mikkelsen wrote: > >>>> Hi, > >>>> > >>>> (On 11.2) > >>>> > >>>> I am occasionally seeing closefrom() block in a child process created by a call to pdfork(). > >>>> > >>>> When this does happen, it is very early after the process has started, while other threads are being created elsewhere in the process. I cannot reproduce it after the thread creation is complete. According to the sigaction man page, this should be async signal safe. > >>>> > >>>> Stack trace from the call to closefrom(): > >>>> > >>>> * frame #0: 0x000000080090276c libthr.so.3`_umtx_op_err at _umtx_op_err.S:37 > >>>> frame #1: 0x00000008008f6121 libthr.so.3`__thr_rwlock_rdlock(rwlock=, flags=, tsp=) at thr_umtx.c:307:10 > >>>> frame #2: 0x00000008008ff1ac libthr.so.3`_thr_rtld_rlock_acquire [inlined] _thr_rwlock_rdlock(rwlock=0x0000000800911600, flags=0, tsp=0x0000000000000000) at thr_umtx.h:232:10 > >>>> frame #3: 0x00000008008ff19b libthr.so.3`_thr_rtld_rlock_acquire(lock=0x0000000800911600) at thr_rtld.c:125 > >>>> frame #4: 0x000000080075332b ld-elf.so.1`rlock_acquire(lock=0x0000000800765270, lockstate=0x00007fffdfbfb8d0) at rtld_lock.c:208:2 > >>>> frame #5: 0x000000080074ba20 ld-elf.so.1`_rtld_bind(obj=0x0000000800769000, reloff=6072) at rtld.c:861:5 > >>>> frame #6: 0x0000000800747c7d ld-elf.so.1`_rtld_bind_start at rtld_start.S:121 > >>>> frame #7: 0x00000000006562d3 prog`Twio::ProcHandle::spawn(this=, command="/bin/echo", args=0x0000000800d7e000, descriptor_mapping=, descriptor_end=3) at prochandle_pdfork.cpp:308:2 > >>> And where is the closefrom() call in the demonstrated trace? > >>> > >>> What version of the system do you use? > >>> You need at least cbdec8db18b533f6d7be (on HEAD) or a5659943e37a74c96e > >>> (stable/13) for pdfork() to behave sanely. But you still not allowed to > >>> call non-async signal safe functions in the child before exec. > >> > >> > >> This is 12.2-p11. I just noticed that I wrote 11.2 above, that is incorrect. > >> > >> Frame 7 is a call to closefrom(). The child process calls dup2(), closefrom(), signal() and then execv(). No other calls are made, and I believe closefrom() is meant to be async signal safe. > >> > > Frame 7 cannot be a call to closefrom(), it would be resolved to closefrom() > > symbol would it be. > > >From lldb, attached to the hung process: > > (lldb) > frame #6: 0x0000000800748c7d ld-elf.so.1`_rtld_bind_start at rtld_start.S:121 > 118 leaq (%rsi,%rsi,2),%rsi # multiply by 3 > 119 leaq (,%rsi,8),%rsi # now 8, for 24 (sizeof Elf_Rela) > 120 > -> 121 call _rtld_bind # Transfer control to the binder > 122 /* Now %rax contains the entry point of the function being called. */ > 123 > 124 movq %rax,0x60(%rsp) # Store target over reloff argument > (lldb) > frame #7: 0x0000000000656813 amt5-chefd`Twio::ProcHandle::spawn(this=, command="/bin/date", args=0x0000000800d5f010, descriptor_mapping=, descriptor_end=3) at prochandle_pdfork.cpp:304:2 > 301 _exit(127); > 302 } > 303 > -> 304 closefrom(descriptor_end); > 305 > 306 signal(SIGPIPE, SIG_DFL); > 307 signal(SIGALRM, SIG_DFL); > (lldb) Ah, yes, it is rtld trying to resolve the symbol. So yes, the fix is to have the revisions I listed below. I am not sure about prerequisites. > > > >> The commit you can apply cleanly to 12.2, I’m running a build now. Are there other issues with pdfork in 12.2? > > > > pdfork() with threading processes requires 21f749da82e755aafab1276 and the > > followup cbdec8db18b533f6d7be. I do not believe any of this is in 12.3, > > and definitely not in 12.2. > > Thanks, will check and apply. > > Regards, > > Jan M. >