Re: git: 8ee579abe09e - main - zfs: fall back if block_cloning feature is disabled

From: Rick Macklem <rick.macklem_at_gmail.com>
Date: Tue, 04 Apr 2023 15:26:02 UTC
On Tue, Apr 4, 2023 at 7:38 AM Mateusz Guzik <mjguzik@gmail.com> wrote:
>
> CAUTION: This email originated from outside of the University of Guelph. Do not click links or open attachments unless you recognize the sender and know the content is safe. If in doubt, forward suspicious emails to IThelp@uoguelph.ca
>
>
> On 4/4/23, Cy Schubert <Cy.Schubert@cschubert.com> wrote:
> > In message <202304041145.334Bjx6l035872@gitrepo.freebsd.org>, Martin
> > Matuska wr
> > ites:
> >> The branch main has been updated by mm:
> >>
> >> URL:
> >> https://cgit.FreeBSD.org/src/commit/?id=8ee579abe09ec1fe15c588fc9a08370b
> >> 83b81cd6
> >>
> >> commit 8ee579abe09ec1fe15c588fc9a08370b83b81cd6
> >> Author:     Martin Matuska <mm@FreeBSD.org>
> >> AuthorDate: 2023-04-04 11:40:41 +0000
> >> Commit:     Martin Matuska <mm@FreeBSD.org>
> >> CommitDate: 2023-04-04 11:43:34 +0000
> >>
> >>     zfs: fall back if block_cloning feature is disabled
> >>
> >>     If block_cloning is disabled, or other errors from zfs_clone_range()
> >>     return an EXDEV we should fall back to vn_generic_copy_file_range().
> >>
> >>     This fixes issues when copying files on the same dataset with
> >>     block_cloning disabled.
> >>
> >>     Upstreamed as pull request to OpenZFS.
> >>
> >>     Reviewed by:    Mateusz Guzik <mjguzik@gmail.com>
> >>     OpenZFS pull request:   14713
> >> ---
> >>  .../openzfs/module/os/freebsd/zfs/zfs_vnops_os.c        | 17
> >> ++++++++++-----
> >> --
> >>  1 file changed, 10 insertions(+), 7 deletions(-)
> >>
> >> diff --git a/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c
> >> b/sys/c
> >> ontrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c
> >> index 97429b360a36..2cd1d27e37bc 100644
> >> --- a/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c
> >> +++ b/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c
> >> @@ -6243,13 +6243,6 @@ zfs_freebsd_copy_file_range(struct
> >> vop_copy_file_range
> >> _args *ap)
> >>      int error;
> >>      uint64_t len = *ap->a_lenp;
> >>
> >> -    /*
> >> -     * TODO: If offset/length is not aligned to recordsize, use
> >> -     * vn_generic_copy_file_range() on this fragment.
> >> -     * It would be better to do this after we lock the vnodes, but then we
> >> -     * need something else than vn_generic_copy_file_range().
> >> -     */
> >> -
> >>      /* Lock both vnodes, avoiding risk of deadlock. */
> >>      do {
> >>              mp = NULL;
> >> @@ -6300,6 +6293,16 @@ unlock:
> >>      if (mp != NULL)
> >>              vn_finished_write(mp);
> >>
> >> +    /*
> >> +     * Fall back if block_cloning feature is disabled
> >> +     * or other EXDEV failures from zfs_vnops.c
> >> +     */
> >> +    if (error == EXDEV) {
> >> +            error = vn_generic_copy_file_range(ap->a_invp, ap->a_inoffp,
> >> +                        ap->a_outvp, ap->a_outoffp, ap->a_lenp, ap->a_flags
> >> ,
> >> +                        ap->a_incred, ap->a_outcred, ap->a_fsizetd);
> >> +    }
> >> +
> >>      return (error);
> >>  }
> >>
> >>
> >
> > This is too late to fall back. On Rick's suggestion the following makes the
> >
> > determination at
> > zfs_freebsd_copy_file_range() entry much earlier.
> >
>
> It's not too late, but I agree it is faster to bail out early.
>
> The proposed patch adds a condition which *differs* from the one in
> zfs_clone_range:
>         if (dmu_objset_spa(inos) != dmu_objset_spa(outos)) {
>                 zfs_exit_two(inzfsvfs, outzfsvfs, FTAG);
>                 return (SET_ERROR(EXDEV));
>         }
>
> ... meaning with the proposed patch the routine can still fail with
> EXDEV, making zfs_freebsd_copy_file_range also do it, which must not
> happen.
Since VOP_COPY_FILE_RANGE() is only called when invp and outvp
are on the same mount point, I don't think this can happen now.
However, there is a TO DO comment that suggests a call with invp and
outvp on different mount points may be in the future.

As such, leaving Martin's patch in so that it calls vn_generic_copy_file_range()
when zfs_clone_range() returns EXDEV seems like a good idea to me.

>
> That aside the code looks rather suspicious for the case where target
> and source vnode are the same. iow more work is needed here.
Definitely needs to be tested. I'll do that later to-day.

rick

>
> As the vnode is unlocked, you *can't* safely access zfsvfs_t
> *outzfsvfs = ZTOZSB(outzp); in that spot in this manner -- a forced
> unmount at the same time can free it.
>
> iow this patch does *NOT* work.
>
> With the committed variant the situation is damage controlled enough
> that there is time to sort it out correctly.
>
> > diff --git a/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c
> > b/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c
> > index d41821ff67f1..e18dcca58192 100644
> > --- a/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c
> > +++ b/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c
> > @@ -6243,6 +6243,18 @@ zfs_freebsd_copy_file_range(struct
> > vop_copy_file_range_args *ap)
> >       int error;
> >       uint64_t len = *ap->a_lenp;
> >
> > +     znode_t *outzp = VTOZ(ap->a_outvp);
> > +     zfsvfs_t *outzfsvfs = ZTOZSB(outzp);
> > +     objset_t *outos = outzfsvfs->z_os;
> > +
> > +        if (!spa_feature_is_enabled(dmu_objset_spa(outos),
> > +            SPA_FEATURE_BLOCK_CLONING)) {
> > +             error = vn_generic_copy_file_range(ap->a_invp, ap->a_inoffp,
> > +                     ap->a_outvp, ap->a_outoffp, ap->a_lenp, ap->a_flags,
> > +                     ap->a_incred, ap->a_outcred, ap->a_fsizetd);
> > +                return (error);
> > +        }
> > +
> >       /*
> >        * TODO: If offset/length is not aligned to recordsize, use
> >        * vn_generic_copy_file_range() on this fragment.
> >
> >
> > Can you revert your commit and commit this, please.
> >
> >
> > --
> > Cheers,
> > Cy Schubert <Cy.Schubert@cschubert.com>
> > FreeBSD UNIX:  <cy@FreeBSD.org>   Web:  https://FreeBSD.org
> > NTP:           <cy@nwtime.org>    Web:  https://nwtime.org
> >
> >                       e^(i*pi)+1=0
> >
> >
> >
> >
>
>
> --
> Mateusz Guzik <mjguzik gmail.com>