From nobody Sun Jul 14 02:42:32 2024 X-Original-To: freebsd-stable@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4WM8ks466Kz5Pwdw for ; Sun, 14 Jul 2024 02:42:45 +0000 (UTC) (envelope-from rick.macklem@gmail.com) Received: from mail-qv1-xf35.google.com (mail-qv1-xf35.google.com [IPv6:2607:f8b0:4864:20::f35]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "WR4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4WM8ks08phz4myx for ; Sun, 14 Jul 2024 02:42:45 +0000 (UTC) (envelope-from rick.macklem@gmail.com) Authentication-Results: mx1.freebsd.org; none Received: by mail-qv1-xf35.google.com with SMTP id 6a1803df08f44-6b760dc7e08so6099066d6.3 for ; Sat, 13 Jul 2024 19:42:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1720924963; x=1721529763; darn=freebsd.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=IONACmGFLI6llEjTq8f6ypuTAj82b9ceNJ1xngFFSdY=; b=PE5U/VwD/wlcHBoLORiqqD7CEygqbLdZPm6CwFZzSHQb/yeJ6Df9P57e6C6J9KgSPq kJiC9nH9bacgna/y7waIC4aEB/D5O5f0beZ7gj6oTN8qrSBW3tYKWYQstYs7eRn6ZC7T bJ0SlPXkx/c/z47erDmOqG+waKqcsCUIM/GbAtHLoLGgMqzbH8+G81sgkjEuLzWg8JQu rG78ZuO0ecaXbBBCPq8ILujk+SRy4iQphbfoiHDlbK9mcc/CnhybEiWc2gYlCCaFUhjW nQXCqMyGpjAxa6giQ8y31/Lm8PnxFfjOpUh+vUQt9AOc4EhfLDtUYvax3oR1odLv81IK rIHQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1720924963; x=1721529763; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=IONACmGFLI6llEjTq8f6ypuTAj82b9ceNJ1xngFFSdY=; b=YNHsmbXm8X1XA3KsdgsjJf9ql8vExvNo7nc8MTHTsJG3sY3gNvCYQo7HvWfm549A0D BwPHdGwx7Dc2p8ETrXvK0WOHDpDcnQSn1vrSK+VDAfolCCpq+cW8KATI6PIVXIcYVs3G 0imILOczlb2HL0cWDvkdOYM3bfHSqE9jksNR8UDOw5gOJIloXC10s+bZ0BrYwdTkkW8H JD6sVr+TB/2YY3XM8GWcCdSv5Il8+3fms8NjU/qKDE3hpLnNcA2H0LS+B9Z34ZLZU8fC 8TctGbwUKLYYTa6gjGcx+Y5G6m4a/NLz8GVSU6Dd/LdvRUiT7eGOapQuHFpHyH4A6TM/ wrKQ== X-Gm-Message-State: AOJu0Yw7rFWfza+xZ2w305gJIvjffcGncZLg9bPmCm+ULJD2z06lTECb f5g7exxzhcp1/K8K2qPx0+cSmIVRWlikbqBiko8zRTWQv4gnIJsQYnNTOE+sv2lzgguCDCdDhIE QB7jiR0uiYxckn795crBYCM9Jf4eg X-Google-Smtp-Source: AGHT+IEWhmWSnhLJIw09iOLUYZCBSNvqL2PWQ1sFic3nY5q0GEurXR0rjnU5/jC0nOjA8zxK/73PhQF+ctszAtryjIw= X-Received: by 2002:a05:6214:19c8:b0:6b4:f761:f0b8 with SMTP id 6a1803df08f44-6b61bc7f095mr239621086d6.8.1720924963380; Sat, 13 Jul 2024 19:42:43 -0700 (PDT) List-Id: Production branch of FreeBSD source code List-Archive: https://lists.freebsd.org/archives/freebsd-stable List-Help: List-Post: List-Subscribe: List-Unsubscribe: X-BeenThere: freebsd-stable@freebsd.org Sender: owner-freebsd-stable@FreeBSD.org MIME-Version: 1.0 References: <26259.12713.114036.564205@hergotha.csail.mit.edu> In-Reply-To: <26259.12713.114036.564205@hergotha.csail.mit.edu> From: Rick Macklem Date: Sat, 13 Jul 2024 19:42:32 -0700 Message-ID: Subject: Re: Possible bug in zfs send or pipe implementation? To: Garrett Wollman Cc: freebsd-stable@freebsd.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spamd-Bar: ---- X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; TAGGED_FROM(0.00)[]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US] X-Rspamd-Queue-Id: 4WM8ks08phz4myx On Sat, Jul 13, 2024 at 7:02=E2=80=AFPM Garrett Wollman wrote: > > I'm migrating an old file server to new hardware using syncoid. Every > so often, the `zfs send` process gets stuck with the following > kstacks: > > 7960 108449 zfs - mi_switch sleepq_cat= ch_signals sleepq_wait_sig _sleep pipe_write zfs_file_write_impl zfs_file_w= rite dump_record dmu_dump_write do_dump dmu_send_impl dmu_send_obj zfs_ioc_= send zfsdev_ioctl_common zfsdev_ioctl devfs_ioctl vn_ioctl devfs_ioctl_f > 7960 126072 zfs send_traverse_threa mi_switch sleepq_cat= ch_signals sleepq_wait_sig _cv_wait_sig bqueue_enqueue_impl send_cb travers= e_visitbp traverse_visitbp traverse_visitbp traverse_dnode traverse_visitbp= traverse_visitbp traverse_visitbp traverse_visitbp traverse_visitbp traver= se_visitbp traverse_dnode traverse_visitbp > 7960 126074 zfs send_merge_thread mi_switch sleepq_cat= ch_signals sleepq_wait_sig _cv_wait_sig bqueue_enqueue_impl send_merge_thre= ad fork_exit fork_trampoline > 7960 126075 zfs send_reader_thread mi_switch sleepq_cat= ch_signals sleepq_wait_sig _cv_wait_sig bqueue_enqueue_impl send_reader_thr= ead fork_exit fork_trampoline > > Near as I can tell, the thread first thread is trying to write > serialized data data to the output pipe and is blocked. The other > threads are stuck because the write process isn't making progress. # ps axHl should show you what wchan's the processes are waiting on and that might give you a clue w.r.t. what is happening? If is easy to build a kernel from sources and boot that, you could try defi= ning PIPE_NODIRECT in sys/kern/sys_pipe.c and see if that avoids the hangs? rick > > The process reading from the pipe (which is just a progress meter) is > sitting in select() waiting for the pipe to become ready, so either > zfs_file_write() is doing something wrong, or the pipe implementation > has lost a selwakeup() somewhere. (Or, possibly but unlikely, the > progress meter has lost the read end of the pipe from its read > fd_set.) Unfortunately, neither fstat nor procstat print any useful > information about the state of the pipe, so I can only try to deduce > what's going on from the observable behavior. > > -GAWollman >