From nobody Mon Jul 15 14:50:12 2024 X-Original-To: freebsd-stable@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4WN4qt2vCXz5R54j for ; Mon, 15 Jul 2024 14:50:18 +0000 (UTC) (envelope-from markjdb@gmail.com) Received: from mail-oi1-x22d.google.com (mail-oi1-x22d.google.com [IPv6:2607:f8b0:4864:20::22d]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "WR4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4WN4qs223mz401G for ; Mon, 15 Jul 2024 14:50:17 +0000 (UTC) (envelope-from markjdb@gmail.com) Authentication-Results: mx1.freebsd.org; dkim=pass header.d=gmail.com header.s=20230601 header.b=KOGVsU34; dmarc=fail reason="SPF not aligned (relaxed), DKIM not aligned (relaxed)" header.from=freebsd.org (policy=none); spf=pass (mx1.freebsd.org: domain of markjdb@gmail.com designates 2607:f8b0:4864:20::22d as permitted sender) smtp.mailfrom=markjdb@gmail.com Received: by mail-oi1-x22d.google.com with SMTP id 5614622812f47-3d9400a1ad9so2519285b6e.1 for ; Mon, 15 Jul 2024 07:50:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1721055015; x=1721659815; darn=freebsd.org; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:sender :from:to:cc:subject:date:message-id:reply-to; bh=sGC5Atqk3L1RtCab8f21W6YSqwbv7/xtYvCndvs+VtU=; b=KOGVsU34Eoecv7zqjgTls5GAaQqIGdxSJNFIER/aCTKX04tM7RpG7WgQTFe8SN37cg OdfPIpRdbsAINh5fvUj4OGhrmuIH1n592BCeNn4zHqugVGSfCZdO9ji5TNesqmccuN/E 4UE5nhZQ+Wl+DmxNfZ8FtmxkcPiqDMaPUHBbiDhH8wUgUynHZ/LlFybL6UheqzCVX4L0 ZewDgRJKv1dT4lThcEEYfjRsOQsP6FXk/fgvyUsEoAySkE//+J568/cAWDUYgcncLhZN qa4bZW+rHHnetHYYKMRQ5Xz+CVzb5kDsFjrHsu2zZdWC1jCHCxCa/7WXKVWJQUNWhchc lQ5Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721055015; x=1721659815; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:sender :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=sGC5Atqk3L1RtCab8f21W6YSqwbv7/xtYvCndvs+VtU=; b=wwGLEo0WXNHru3ro0E9rEJXayTCHiXCX/Rias8Dnzt5zi/Mp4Gy72GNNaT9MpN9aaP Y7xeWwhsz7NJNUcsxY9Z8MisMzHGiujM/m9i8rfD1hbMm54HkjQyDiDIzbN4bOt4HJ63 fML+P1C9ppam7gk4UFgpSmvzjRaNRAsxxEpnSQaKtrGB1HqP0KgUqadfaGLwH1UZ3LaV JKh4J6LP4NMc8IFqvRA0TKnqzTXAFPEWSr3J59KPjuRUukRXaMbjOMbEhTXWoMccMI5g 9a05w1tV2FG0nxnJy61liWOhSuUL1ZAVAbtah3Dw1en3zMjheiF0mgeLyBkXOKkX/uuR hmoA== X-Forwarded-Encrypted: i=1; AJvYcCUY2xIlAALyCvxRxaoVeeCO56uXrArsZLTjaPR3fizGkq2iIipX7vGfZhlgDeUmcYN13N7v+Tber/aV6M1Py8O7QABYD2ayZO0acA== X-Gm-Message-State: AOJu0Yzn8btF8js3Ckv5eqmJ0lAr1BvISKqLa0larm9tjhsnoUhVClXb Lj7d/Go12feyhpM8EIePsRD9W/RauR0hNQA4RWGG4zf2YIzEJjQyYgmXsA== X-Google-Smtp-Source: AGHT+IHS3S9cPg0NXSlrbeZZ+2s30GpIEUwXJWRE4xY/u2Qd/NM05p8jtQB2sFMQoIxZ74SvLDY52w== X-Received: by 2002:a05:6808:220b:b0:3d9:29e3:7df1 with SMTP id 5614622812f47-3d93c022869mr28855881b6e.27.1721055015320; Mon, 15 Jul 2024 07:50:15 -0700 (PDT) Received: from nuc (192-0-220-237.cpe.teksavvy.com. [192.0.220.237]) by smtp.gmail.com with ESMTPSA id af79cd13be357-7a160bbe839sm205351485a.40.2024.07.15.07.50.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 15 Jul 2024 07:50:14 -0700 (PDT) Date: Mon, 15 Jul 2024 10:50:12 -0400 From: Mark Johnston To: Rick Macklem Cc: Garrett Wollman , freebsd-stable@freebsd.org Subject: Re: Possible bug in zfs send or pipe implementation? Message-ID: References: <26259.12713.114036.564205@hergotha.csail.mit.edu> <26259.17366.276955.824313@hergotha.csail.mit.edu> <26260.2984.961319.782123@hergotha.csail.mit.edu> List-Id: Production branch of FreeBSD source code List-Archive: https://lists.freebsd.org/archives/freebsd-stable List-Help: List-Post: List-Subscribe: List-Unsubscribe: X-BeenThere: freebsd-stable@freebsd.org Sender: owner-freebsd-stable@FreeBSD.org MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Spamd-Bar: - X-Spamd-Result: default: False [-1.60 / 15.00]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; SUBJECT_ENDS_QUESTION(1.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_SHORT(-1.00)[-0.999]; MID_RHS_NOT_FQDN(0.50)[]; FORGED_SENDER(0.30)[markj@freebsd.org,markjdb@gmail.com]; R_DKIM_ALLOW(-0.20)[gmail.com:s=20230601]; R_SPF_ALLOW(-0.20)[+ip6:2607:f8b0:4000::/36]; DMARC_POLICY_SOFTFAIL(0.10)[freebsd.org : SPF not aligned (relaxed), DKIM not aligned (relaxed),none]; MIME_GOOD(-0.10)[text/plain]; FREEMAIL_TO(0.00)[gmail.com]; FROM_HAS_DN(0.00)[]; ARC_NA(0.00)[]; FREEMAIL_ENVFROM(0.00)[gmail.com]; RCVD_TLS_LAST(0.00)[]; TO_DN_SOME(0.00)[]; MIME_TRACE(0.00)[0:+]; DKIM_TRACE(0.00)[gmail.com:+]; MISSING_XM_UA(0.00)[]; RCVD_IN_DNSWL_NONE(0.00)[2607:f8b0:4864:20::22d:from]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCVD_COUNT_TWO(0.00)[2]; FROM_NEQ_ENVFROM(0.00)[markj@freebsd.org,markjdb@gmail.com]; RCPT_COUNT_THREE(0.00)[3]; PREVIOUSLY_DELIVERED(0.00)[freebsd-stable@freebsd.org]; TAGGED_RCPT(0.00)[]; MLMMJ_DEST(0.00)[freebsd-stable@freebsd.org]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US]; RCVD_VIA_SMTP_AUTH(0.00)[]; DWL_DNSWL_NONE(0.00)[gmail.com:dkim] X-Rspamd-Queue-Id: 4WN4qs223mz401G On Sun, Jul 14, 2024 at 03:14:44PM -0700, Rick Macklem wrote: > On Sun, Jul 14, 2024 at 10:32 AM Garrett Wollman > wrote: > > > < > > said: > > > > > Just to clarify it, are you saying zfs is sleeping on "pipewr"? > > > (There is also a msleep() for "pipbww" in pipe_write().) > > > > It is sleeping on pipewr, yes. > > > > [wollman@nfs-prod-11 ~]$ sysctl kern.ipc.pipekva > > kern.ipc.pipekva: 536576 > > [wollman@nfs-prod-11 ~]$ sysctl kern.ipc.maxpipekva > > kern.ipc.maxpipekva: 2144993280 > > > > It's not out of KVA, it's just waiting for the `pv` process to wake up > > and read more data. `pv` is single-threaded and blocked on "select". > > > > It doesn't always get stuck in the same place, which is why I'm > > suspecting a lost wakeup somewhere. > > > This snippet from sys/kern/sys_pipe.c looks a little suspicious to me... > /* > * Direct copy, bypassing a kernel buffer. > */ > } else if ((size = rpipe->pipe_pages.cnt) != 0) { > if (size > uio->uio_resid) > size = (u_int) uio->uio_resid; > PIPE_UNLOCK(rpipe); > error = uiomove_fromphys(rpipe->pipe_pages.ms, > rpipe->pipe_pages.pos, size, uio); > PIPE_LOCK(rpipe); > if (error) > break; > nread += size; > rpipe->pipe_pages.pos += size; > rpipe->pipe_pages.cnt -= size; > if (rpipe->pipe_pages.cnt == 0) { > rpipe->pipe_state &= ~PIPE_WANTW; > wakeup(rpipe); > } > If it reads uio_resid bytes which is less than pipe_pages.cnt, no > wakeup() occurs. > I'd be tempted to try getting rid of the "if (rpipe->pipe_pages.cnt == 0)" > and do the wakeup() unconditionally, to see if it helps? I don't think that can help. pipe_write() will block if the "direct write" buffer is non-empty. See the comment in pipe_write(), "Pipe buffered writes cannot be coincidental with direct writes". select()/poll()/etc. should return an event if pipe_pages.cnt > 0 on the read side, so I suspect that the problem is elsewhere, or else I'm misunderstanding something. > Because if the application ("pv" in this case) doesn't do another read() on > the > pipe before calling select(), no wakeup() is going to occur, because here's > what pipe_write() does... > /* > * We have no more space and have something to offer, > * wake up select/poll. > */ > pipeselwakeup(wpipe); > > wpipe->pipe_state |= PIPE_WANTW; > pipeunlock(wpipe); > error = msleep(wpipe, PIPE_MTX(rpipe), > PRIBIO | PCATCH, "pipewr", 0); > pipelock(wpipe, 0); > if (error != 0) > break; > continue; > Note that, once in msleep(), no call to pipeselwakeup() will occur until > it gets woken up. > > I think the current code assumes that the reader ("pv" in this case) will > read all the data out of the pipe before calling select() again. > Does it do that? > > rick > ps: I've added markj@ as a cc, since he seems to have been the last guy > involved > in sys_pipe.c.