From nobody Tue Apr 04 15:33:12 2023 X-Original-To: dev-commits-src-all@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4PrWxd0jRDz43kwP; Tue, 4 Apr 2023 15:33:25 +0000 (UTC) (envelope-from rick.macklem@gmail.com) Received: from mail-pf1-x433.google.com (mail-pf1-x433.google.com [IPv6:2607:f8b0:4864:20::433]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4PrWxc5jPJz3kV7; Tue, 4 Apr 2023 15:33:24 +0000 (UTC) (envelope-from rick.macklem@gmail.com) Authentication-Results: mx1.freebsd.org; none Received: by mail-pf1-x433.google.com with SMTP id cm5so15883103pfb.0; Tue, 04 Apr 2023 08:33:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1680622402; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=snYCxlWANePZ5C9RTQNJcS2YicQnwJtDjvkRNjJTpt8=; b=fXMzX1rEf5GoFofAlObSu9Rx/W1lqD2n7VKTw9Wea9vYWy+jA/GFtRcUO0qJV+qlYG AOyWIgNyHSeMyMVB9IS+N+EWlymk+mLs/dhx0voDG6/EM1RD13daBTlf4wle3tV68NxA PWnOw5DPGxf1k1StgBCvHSKlOfh4DGekxFzNE2iWdSwLjxyunvloUmS0EVNx+HiwGMKL W74QgCdH7AbiOA9oYx414iCnhzXqq2PV3xxkom89IuesKs39IfiotyU92RHjluzjJQGQ X6taH5nx4L/5czfT3SEhNvSZjtLIMI84X5EJCwTZ8hz+48UPohn7f7avRPSYHqUZ9eIC m7BA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680622402; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=snYCxlWANePZ5C9RTQNJcS2YicQnwJtDjvkRNjJTpt8=; b=TqkuVr5pwE0XG10RBk0GaJjnW2ssIP0Gj9rbUVYYb6rlHuIU/LZSZeSH4AzzIWmYSM qQv3TGx4w1cE9SMU1YZqDJO168GO4+F/+yS0+p2HCYWOacFYT3Km4yvlmj1wVyJ7F+j5 s8PyfhHoCMZvb+f4gBSMzT+NFjuBJ+IDYNqp0S34FjMNDfWQi6a0rqR5hpdOHGw1kTTp qtj7oboyroRsV/fVWCjN5EGSGGPUtDW9Cprqabch5LvsYd0rXD848OppaDhyLKVHrD2Q OYrPOx3by/G0coHvJ75yhFu40sEca/yqdh+/m5T2GuV8J351Eo6MMKv/BGMGaxM6t7qP T45w== X-Gm-Message-State: AAQBX9eyiPIYxs1yaQ/gjeR0H0oioeb7StfihZfPSebaZwiqD5802Np6 toRi5q74ZGyCjINaVHNHVr0pw6MxcwH2RYkHmA== X-Google-Smtp-Source: AKy350Z0OdINMdpvUWPTBihmW45aBV/lBpQl4BTRoG3AVb/w3GoVJ/Fh6bffnmctA63Jb3wGIimny5gqgU8+T7DKb2Q= X-Received: by 2002:a05:6a00:2d27:b0:623:7446:7075 with SMTP id fa39-20020a056a002d2700b0062374467075mr1468792pfb.2.1680622401657; Tue, 04 Apr 2023 08:33:21 -0700 (PDT) List-Id: Commit messages for all branches of the src repository List-Archive: https://lists.freebsd.org/archives/dev-commits-src-all List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-dev-commits-src-all@freebsd.org X-BeenThere: dev-commits-src-all@freebsd.org MIME-Version: 1.0 References: <202304041145.334Bjx6l035872@gitrepo.freebsd.org> <20230404141717.B976D31C@slippy.cwsent.com> <20230404150206.0A0512A7@slippy.cwsent.com> In-Reply-To: <20230404150206.0A0512A7@slippy.cwsent.com> From: Rick Macklem Date: Tue, 4 Apr 2023 08:33:12 -0700 Message-ID: Subject: Re: git: 8ee579abe09e - main - zfs: fall back if block_cloning feature is disabled To: Cy Schubert Cc: Mateusz Guzik , Martin Matuska , src-committers@freebsd.org, dev-commits-src-all@freebsd.org, dev-commits-src-main@freebsd.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 4PrWxc5jPJz3kV7 X-Spamd-Bar: ---- X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US]; TAGGED_FROM(0.00)[] X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-ThisMailContainsUnwantedMimeParts: N On Tue, Apr 4, 2023 at 8:02=E2=80=AFAM Cy Schubert wrote: > > CAUTION: This email originated from outside of the University of Guelph. = Do not click links or open attachments unless you recognize the sender and = know the content is safe. If in doubt, forward suspicious emails to IThelp@= uoguelph.ca > > > In message om> > , Mateusz Guzik writes: > > On 4/4/23, Cy Schubert wrote: > > > In message <202304041145.334Bjx6l035872@gitrepo.freebsd.org>, Martin > > > Matuska wr > > > ites: > > >> The branch main has been updated by mm: > > >> > > >> URL: > > >> https://cgit.FreeBSD.org/src/commit/?id=3D8ee579abe09ec1fe15c588fc9a= 08370b > > >> 83b81cd6 > > >> > > >> commit 8ee579abe09ec1fe15c588fc9a08370b83b81cd6 > > >> Author: Martin Matuska > > >> AuthorDate: 2023-04-04 11:40:41 +0000 > > >> Commit: Martin Matuska > > >> CommitDate: 2023-04-04 11:43:34 +0000 > > >> > > >> zfs: fall back if block_cloning feature is disabled > > >> > > >> If block_cloning is disabled, or other errors from zfs_clone_ran= ge() > > >> return an EXDEV we should fall back to vn_generic_copy_file_rang= e(). > > >> > > >> This fixes issues when copying files on the same dataset with > > >> block_cloning disabled. > > >> > > >> Upstreamed as pull request to OpenZFS. > > >> > > >> Reviewed by: Mateusz Guzik > > >> OpenZFS pull request: 14713 > > >> --- > > >> .../openzfs/module/os/freebsd/zfs/zfs_vnops_os.c | 17 > > >> ++++++++++----- > > >> -- > > >> 1 file changed, 10 insertions(+), 7 deletions(-) > > >> > > >> diff --git a/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.= c > > >> b/sys/c > > >> ontrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c > > >> index 97429b360a36..2cd1d27e37bc 100644 > > >> --- a/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c > > >> +++ b/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c > > >> @@ -6243,13 +6243,6 @@ zfs_freebsd_copy_file_range(struct > > >> vop_copy_file_range > > >> _args *ap) > > >> int error; > > >> uint64_t len =3D *ap->a_lenp; > > >> > > >> - /* > > >> - * TODO: If offset/length is not aligned to recordsize, use > > >> - * vn_generic_copy_file_range() on this fragment. > > >> - * It would be better to do this after we lock the vnodes, but th= en we > > >> - * need something else than vn_generic_copy_file_range(). > > >> - */ > > >> - > > >> /* Lock both vnodes, avoiding risk of deadlock. */ > > >> do { > > >> mp =3D NULL; > > >> @@ -6300,6 +6293,16 @@ unlock: > > >> if (mp !=3D NULL) > > >> vn_finished_write(mp); > > >> > > >> + /* > > >> + * Fall back if block_cloning feature is disabled > > >> + * or other EXDEV failures from zfs_vnops.c > > >> + */ > > >> + if (error =3D=3D EXDEV) { > > >> + error =3D vn_generic_copy_file_range(ap->a_invp, ap->a_in= offp, > > >> + ap->a_outvp, ap->a_outoffp, ap->a_lenp, ap->a= _flags > > >> , > > >> + ap->a_incred, ap->a_outcred, ap->a_fsizetd); > > >> + } > > >> + > > >> return (error); > > >> } > > >> > > >> > > > > > > This is too late to fall back. On Rick's suggestion the following mak= es the > > > > > > determination at > > > zfs_freebsd_copy_file_range() entry much earlier. > > > > > > > It's not too late, but I agree it is faster to bail out early. > > > > The proposed patch adds a condition which *differs* from the one in > > zfs_clone_range: > > if (dmu_objset_spa(inos) !=3D dmu_objset_spa(outos)) { > > zfs_exit_two(inzfsvfs, outzfsvfs, FTAG); > > return (SET_ERROR(EXDEV)); > > } > > > > ... meaning with the proposed patch the routine can still fail with > > EXDEV, making zfs_freebsd_copy_file_range also do it, which must not > > happen. > > > > That aside the code looks rather suspicious for the case where target > > and source vnode are the same. iow more work is needed here. > > > > As the vnode is unlocked, you *can't* safely access zfsvfs_t > > *outzfsvfs =3D ZTOZSB(outzp); in that spot in this manner -- a forced > > unmount at the same time can free it. > > > > iow this patch does *NOT* work. > > > > With the committed variant the situation is damage controlled enough > > that there is time to sort it out correctly. > > > > > diff --git a/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c > > > b/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c > [...] > > Gotcha. What you're suggesting is something more like this. Check for > block_cloning and also retry should zfs_clone_range() return EXDEV for an= y > other reason. > > diff --git a/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c > b/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c > index baa2ee5b3824..60916bfcfbc3 100644 > --- a/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c > +++ b/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c > @@ -6239,6 +6239,9 @@ zfs_freebsd_copy_file_range(struct > vop_copy_file_range_args *ap) > struct vnode *invp =3D ap->a_invp; > struct vnode *outvp =3D ap->a_outvp; > struct mount *mp; > + znode_t *outzp; > + zfsvfs_t *outzfsvfs; > + objset_t *outos; > struct uio io; > int error; > uint64_t len =3D *ap->a_lenp; > @@ -6276,6 +6279,19 @@ zfs_freebsd_copy_file_range(struct > vop_copy_file_range_args *ap) > } while (error =3D=3D 0); > if (error !=3D 0) > return (error); > + > + outzp =3D VTOZ(ap->a_outvp); > + outzfsvfs =3D ZTOZSB(outzp); > + outos =3D outzfsvfs->z_os; > + > + if (!spa_feature_is_enabled(dmu_objset_spa(outos), > + SPA_FEATURE_BLOCK_CLONING)) { > + error =3D vn_generic_copy_file_range(ap->a_invp, ap->a_in= offp, > + ap->a_outvp, ap->a_outoffp, ap->a_lenp, ap->a_fla= gs, > + ap->a_incred, ap->a_outcred, ap->a_fsizetd); > + goto unlock; > + } The trouble with doing it here is that the code has already gone through all the vnode locking arm waving. It seems that it is just as easy (and avo= ids duplicating the test) to call zfs_clone_range() and let it do the test. I'd rather do this test after only locking outvp, which avoids looping when invp cannot be locked without waiting. (This loop can burn up a lot of cpu when invp is exclusively locked by something else.) The other tests cannot be done until both vnodes are locked, but I'm not sure duplicating the code to check them here instead of just letting zfs_clone_range() do them gains much? rick > + > #ifdef MAC > error =3D mac_vnode_check_write(curthread->td_ucred, ap->a_outcre= d, > outvp); > @@ -6291,6 +6307,11 @@ zfs_freebsd_copy_file_range(struct > vop_copy_file_range_args *ap) > > error =3D zfs_clone_range(VTOZ(invp), ap->a_inoffp, VTOZ(outvp), > ap->a_outoffp, &len, ap->a_outcred); > + > + if (error =3D=3D EXDEV) > + error =3D vn_generic_copy_file_range(ap->a_invp, ap->a_in= offp, > + ap->a_outvp, ap->a_outoffp, ap->a_lenp, ap->a_fla= gs, > + ap->a_incred, ap->a_outcred, ap->a_fsizetd); > *ap->a_lenp =3D (size_t)len; > > unlock: > > > -- > Cheers, > Cy Schubert > FreeBSD UNIX: Web: https://FreeBSD.org > NTP: Web: https://nwtime.org > > e^(i*pi)+1=3D0 > > > >