From nobody Sun Jul 14 02:02:17 2024 X-Original-To: freebsd-stable@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4WM7rT6GZtz5Rb6g for ; Sun, 14 Jul 2024 02:02:33 +0000 (UTC) (envelope-from wollman@hergotha.csail.mit.edu) Received: from hergotha.csail.mit.edu (tunnel82308-pt.tunnel.tserv4.nyc4.ipv6.he.net [IPv6:2001:470:1f06:ccb::2]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "garrett.wollman.name", Issuer "R11" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4WM7rS5lSVz4hxP for ; Sun, 14 Jul 2024 02:02:32 +0000 (UTC) (envelope-from wollman@hergotha.csail.mit.edu) Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=bimajority.org (policy=none); spf=pass (mx1.freebsd.org: domain of wollman@hergotha.csail.mit.edu designates 2001:470:1f06:ccb::2 as permitted sender) smtp.mailfrom=wollman@hergotha.csail.mit.edu Received: from hergotha.csail.mit.edu (localhost [127.0.0.1]) by hergotha.csail.mit.edu (8.18.1/8.18.1) with ESMTPS id 46E22JVs007795 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO) for ; Sat, 13 Jul 2024 22:02:22 -0400 (EDT) (envelope-from wollman@hergotha.csail.mit.edu) Received: (from wollman@localhost) by hergotha.csail.mit.edu (8.18.1/8.18.1/Submit) id 46E22HP7007794; Sat, 13 Jul 2024 22:02:17 -0400 (EDT) (envelope-from wollman) List-Id: Production branch of FreeBSD source code List-Archive: https://lists.freebsd.org/archives/freebsd-stable List-Help: List-Post: List-Subscribe: List-Unsubscribe: X-BeenThere: freebsd-stable@freebsd.org Sender: owner-freebsd-stable@FreeBSD.org MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <26259.12713.114036.564205@hergotha.csail.mit.edu> Date: Sat, 13 Jul 2024 22:02:17 -0400 From: Garrett Wollman To: freebsd-stable@freebsd.org Subject: Possible bug in zfs send or pipe implementation? X-Mailer: VM 8.2.0b under 29.2 (amd64-portbld-freebsd13.2) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.6.4 (hergotha.csail.mit.edu [0.0.0.0]); Sat, 13 Jul 2024 22:02:22 -0400 (EDT) X-Spam-Status: No, score=-0.8 required=5.0 tests=ALL_TRUSTED, HEADER_FROM_DIFFERENT_DOMAINS,T_SCC_BODY_TEXT_LINE autolearn=disabled version=4.0.0 X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-14) on hergotha.csail.mit.edu X-Spamd-Bar: - X-Spamd-Result: default: False [-1.90 / 15.00]; SUBJECT_ENDS_QUESTION(1.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; NEURAL_HAM_LONG(-1.00)[-0.999]; NEURAL_HAM_SHORT(-1.00)[-0.997]; FORGED_SENDER(0.30)[wollman@bimajority.org,wollman@hergotha.csail.mit.edu]; R_SPF_ALLOW(-0.20)[+ip6:2001:470:1f06:ccb::2]; MIME_GOOD(-0.10)[text/plain]; DMARC_POLICY_SOFTFAIL(0.10)[bimajority.org : SPF not aligned (relaxed), No valid DKIM,none]; RCPT_COUNT_ONE(0.00)[1]; TO_DN_NONE(0.00)[]; FREEFALL_USER(0.00)[wollman]; ASN(0.00)[asn:6939, ipnet:2001:470::/32, country:US]; MIME_TRACE(0.00)[0:+]; FROM_HAS_DN(0.00)[]; R_DKIM_NA(0.00)[]; MLMMJ_DEST(0.00)[freebsd-stable@freebsd.org]; TO_MATCH_ENVRCPT_ALL(0.00)[]; RCVD_COUNT_TWO(0.00)[2]; PREVIOUSLY_DELIVERED(0.00)[freebsd-stable@freebsd.org]; FROM_NEQ_ENVFROM(0.00)[wollman@bimajority.org,wollman@hergotha.csail.mit.edu]; RCVD_TLS_LAST(0.00)[]; ARC_NA(0.00)[] X-Rspamd-Queue-Id: 4WM7rS5lSVz4hxP I'm migrating an old file server to new hardware using syncoid. Every so often, the `zfs send` process gets stuck with the following kstacks: 7960 108449 zfs - mi_switch sleepq_catch_signals sleepq_wait_sig _sleep pipe_write zfs_file_write_impl zfs_file_write dump_record dmu_dump_write do_dump dmu_send_impl dmu_send_obj zfs_ioc_send zfsdev_ioctl_common zfsdev_ioctl devfs_ioctl vn_ioctl devfs_ioctl_f 7960 126072 zfs send_traverse_threa mi_switch sleepq_catch_signals sleepq_wait_sig _cv_wait_sig bqueue_enqueue_impl send_cb traverse_visitbp traverse_visitbp traverse_visitbp traverse_dnode traverse_visitbp traverse_visitbp traverse_visitbp traverse_visitbp traverse_visitbp traverse_visitbp traverse_dnode traverse_visitbp 7960 126074 zfs send_merge_thread mi_switch sleepq_catch_signals sleepq_wait_sig _cv_wait_sig bqueue_enqueue_impl send_merge_thread fork_exit fork_trampoline 7960 126075 zfs send_reader_thread mi_switch sleepq_catch_signals sleepq_wait_sig _cv_wait_sig bqueue_enqueue_impl send_reader_thread fork_exit fork_trampoline Near as I can tell, the thread first thread is trying to write serialized data data to the output pipe and is blocked. The other threads are stuck because the write process isn't making progress. The process reading from the pipe (which is just a progress meter) is sitting in select() waiting for the pipe to become ready, so either zfs_file_write() is doing something wrong, or the pipe implementation has lost a selwakeup() somewhere. (Or, possibly but unlikely, the progress meter has lost the read end of the pipe from its read fd_set.) Unfortunately, neither fstat nor procstat print any useful information about the state of the pipe, so I can only try to deduce what's going on from the observable behavior. -GAWollman