From nobody Sat Nov 18 23:23:29 2023 X-Original-To: freebsd-current@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4SXqb14ZCJz50pgt for ; Sat, 18 Nov 2023 23:23:41 +0000 (UTC) (envelope-from rick.macklem@gmail.com) Received: from mail-oo1-xc33.google.com (mail-oo1-xc33.google.com [IPv6:2607:f8b0:4864:20::c33]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4SXqb05Pnvz3HPd; Sat, 18 Nov 2023 23:23:40 +0000 (UTC) (envelope-from rick.macklem@gmail.com) Authentication-Results: mx1.freebsd.org; dkim=pass header.d=gmail.com header.s=20230601 header.b=UjpGtsEZ; spf=pass (mx1.freebsd.org: domain of rick.macklem@gmail.com designates 2607:f8b0:4864:20::c33 as permitted sender) smtp.mailfrom=rick.macklem@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-oo1-xc33.google.com with SMTP id 006d021491bc7-581de3e691dso1726981eaf.3; Sat, 18 Nov 2023 15:23:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1700349819; x=1700954619; darn=freebsd.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=41SuuBuihEFShPon4NZZIddX/lb02DSI37qybqY0N4I=; b=UjpGtsEZrisFwd+swlgYY0Ca+7oDi/OusnA5pllfQb+eY5h8lJwhwRDVTXSoiFFGRu DO4n9YZlM74aY7jxdRUYrxao+VkrWrUhJEllso/3amNbnr8UwPaDQhh4TNOxe/3NImdr 3TaVMqrE8Zr2mWCmuqJKk9nACAVYXy1ukIcqHZH2Z/ecYjmpnNxHUZb4gDD2DdfQZ+PG cuES+M0zSmj4AG+wU0rJCe0uMXnu0rX4CwBKGzlG2ztIr2CgJU0K8BB4xwmntOoSripY bFP5iuBTyPYjpNP+wpzCvBCktuQzfYXGLOgXcTuU0KLjZNf9HekxpjnpvFGKPisCWycH gH0g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1700349819; x=1700954619; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=41SuuBuihEFShPon4NZZIddX/lb02DSI37qybqY0N4I=; b=f2KyLVN1S9SQuORRr/uozafcixfPxbM6WfFCUkqnAkP8SBHfz42EE8lA2Arb9oRQIQ SnDLY6mlt8XPoErkg6Y+v84RhYQIEGzbkjsuRdu5vFnvdVH40Lg3bdtXdMOMtqL53Lax EnbxucLsDgKFH6BDG9xKHR4iNEsiKEx3HB46moIA22IC+fCQflMmTonihHBIt5hT4boZ Za5j0UYP/5CaflwLPBVju7NlLoJLJSbuLxbs4LAA+vqRvvm+pUmCGiShx72qTU2EUWZy xqoQrFxv9uFhBpcm7GaYAgz7QJjdDZTl0b02yfVw4AMHi5EsM5mHWIgp0OhYjmNamBP0 YU7w== X-Gm-Message-State: AOJu0YxrFwoDY8tV8lDZsLWniqY5lH9IsZ5G0xCCPBCX9MxGJQrrZ8Ct S5RiG4I1Jd3vwJI9qOnk66JJf3HqXUNpBB/e3w== X-Google-Smtp-Source: AGHT+IFJz5cn/Vqc6k1xFF+XfUytGAHgjwj2nfz9sEQWf2zZPJqfftbqJ0CS9HScmU9FexRbE+HBocTxWI4n9v0fwvY= X-Received: by 2002:a05:6871:a011:b0:1f0:811a:324d with SMTP id vp17-20020a056871a01100b001f0811a324dmr4019147oab.51.1700349819305; Sat, 18 Nov 2023 15:23:39 -0800 (PST) List-Id: Discussions about the use of FreeBSD-current List-Archive: https://lists.freebsd.org/archives/freebsd-current List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-current@freebsd.org MIME-Version: 1.0 References: <25943.60056.880614.452966@hergotha.csail.mit.edu> <91988E23-ED50-4379-AA5F-4B069E08D80F@karels.net> In-Reply-To: <91988E23-ED50-4379-AA5F-4B069E08D80F@karels.net> From: Rick Macklem Date: Sat, 18 Nov 2023 15:23:29 -0800 Message-ID: Subject: Re: NFS exports of ZFS snapshots broken To: Mike Karels , FreeBSD CURRENT Cc: Alexander Motin , Martin Matuska , Garrett Wollman Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spamd-Result: default: False [-3.97 / 15.00]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; NEURAL_HAM_SHORT(-0.97)[-0.971]; DMARC_POLICY_ALLOW(-0.50)[gmail.com,none]; R_SPF_ALLOW(-0.20)[+ip6:2607:f8b0:4000::/36]; R_DKIM_ALLOW(-0.20)[gmail.com:s=20230601]; MIME_GOOD(-0.10)[text/plain]; MLMMJ_DEST(0.00)[freebsd-current@freebsd.org]; RCVD_COUNT_ONE(0.00)[1]; FROM_EQ_ENVFROM(0.00)[]; RCVD_TLS_LAST(0.00)[]; FREEMAIL_ENVFROM(0.00)[gmail.com]; MIME_TRACE(0.00)[0:+]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US]; RCVD_IN_DNSWL_NONE(0.00)[2607:f8b0:4864:20::c33:from]; DKIM_TRACE(0.00)[gmail.com:+]; TAGGED_FROM(0.00)[]; FROM_HAS_DN(0.00)[]; ARC_NA(0.00)[]; MID_RHS_MATCH_FROMTLD(0.00)[]; FREEMAIL_FROM(0.00)[gmail.com]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCPT_COUNT_FIVE(0.00)[5]; TO_DN_ALL(0.00)[]; DWL_DNSWL_NONE(0.00)[gmail.com:dkim] X-Rspamd-Queue-Id: 4SXqb05Pnvz3HPd X-Spamd-Bar: --- On Sat, Nov 18, 2023 at 2:27=E2=80=AFPM Mike Karels wrote= : > > On 18 Nov 2023, at 15:58, Rick Macklem wrote: > > > On Sat, Nov 18, 2023 at 8:09=E2=80=AFAM Rick Macklem wrote: > >> > >> On Fri, Nov 17, 2023 at 8:19=E2=80=AFPM Mike Karels = wrote: > >>> > >>> On 17 Nov 2023, at 22:14, Mike Karels wrote: > >>> > >>>> On 17 Nov 2023, at 21:24, Rick Macklem wrote: > >>>> > >>>>> Most of the changes in stable/13 that are not in releng/13.2 > >>>>> are the "make it work in a jail" stuff. Unfortunately, they are > >>>>> a large # of changes (mostly trivial edits adding vnet macros), > >>>>> but it also includes export check changes. > >>>>> > >>>>> I have attached a trivial patch that I think disables the export > >>>>> checks for jails. If either of you can try it and see if it fixes > >>>>> the problem, that would be great. > >>>>> (Note that this is only for testing, although it probably does not > >>>>> matter unless you are running nfsd(8) in vnet jails.) > >>>> > >>>> Yes, I can see snapshots with the patch. This system is just a test > >>>> system that doesn't normally run ZFS or NFS, so no problem messing > >>>> with permissions. It's a bhyve VM, so I just added a small disk and > >>>> enabled ZFS for testing. > >>> > >>> btw, you might try to get mm@ or maybe mav@ to help out from the ZFS > >>> side. It must be doing something differently inside a snapshot than > >>> outside, maybe with file handles or something like that. > >> Yes. I've added freebsd-current@ (although Garrett is not on it, he is > >> cc'd) and these guys specifically... > >> > >> So, here's what appears to be the problem... > >> Commit 88175af (in main and stable/13, but not 13.2) added checks for > >> nfsd(8) running in jails by filling in mnt_exjail with a reference to = the cred > >> used when the file system is exported. > >> When mnt_exjail is found NULL, the current nfsd code assumes that ther= e > >> is no access allowed for the mount. > >> > >> My vague understanding is that when a ZFS snapshot is accessed, it is > >> "pseudo-mounted" by zfsctl_snapdir_lookup() and I am guessing that > >> mnt_exjail is NULL as a result. > >> Since I do not know the ZFS code and don't even have an easy way to > >> test this (thankfully Mike can test easily), I do not know what to do = from > >> here? > >> > >> Is there a "struct mount" constructed for this pseudo mount > >> (or it actually appears to be the lookup of ".." that fails, so it > >> might be the parent of the snapshot subdir?)? > >> > >> One thought is that I can check to see if the mount pointer is in the > >> mountlist (I don't think the snapshot's mount is in the mountlist) and > >> avoid the jail test for this case. This would assume that snapshots a= re > >> always within the file system(s) exported via that jail (which include= s > >> the case of prison0, of course), so that they do not need a separate > >> jail check. > >> > >> If this doesn't work, there will need to be some sort of messing about > >> in ZFS to set mnt_exjail for these. > > Ok, so now onto the hard part... > > Thanks to Mike and others, I did create a snapshot under .zfs and I can > > see the problem. It is that mnt_exjail =3D=3D NULL. > > Now, is there a way that this "struct mount" can be recognized as "spec= ial" > > for snapshots, so I can avoid the mnt_exjail =3D=3D NULL test? > > (I had hoped that "mp->mnt_list.tqe_prev" would be NULL, but that is no= t > > the case.) > > Dumb question, is the mount point (mp presumably) different between the > snapshot and the main file system? Not a dump question and the answer is rather interesting... It is "sometimes" or "usually" according to my printf(). It seems that when you first "cd . * * This is where we lie about our v_vfsp in order to * make .zfs/snapshot/ accessible over NFS * without requiring manual mounts of . */ ASSERT3P(VTOZ(*vpp)->z_zfsvfs, !=3D, zfsvfs); VTOZ(*vpp)->z_zfsvfs->z_parent =3D zfsvfs; /* Clear the root flag (set via VFS_ROOT) as well. */ (*vpp)->v_vflag &=3D ~VV_ROOT; which seems to set the mp to that of the parent, but it seems this does not happen for the initial lookup of the ? I'll note that there is code before this in zfsctl_snapdir_lookup() for handling cases like "." and ".." that return without doing this. Now, why does this work without the mnt_exjail check (as in 13.2)? I am not quite sure, but there is this "cheat" in the NFS server (it has been there for years, maybe decades): /* * Allow a Lookup, Getattr, GetFH, Secinfo on an * non-exported directory if * nfs_rootfhset. Do I need to allow any other Ops? * (You can only have a non-exported vpnes if * nfs_rootfhset is true. See nfsd_fhtovp()) * Allow AUTH_SYS to be used for file systems * exported GSS only for certain Ops, to allow * clients to do mounts more easily. */ if (nfsv4_opflag[op].needscfh && vp) { if (!NFSVNO_EXPORTED(&vpnes) && op !=3D NFSV4OP_LOOKUP && op !=3D NFSV4OP_GETATTR && op !=3D NFSV4OP_GETFH && op !=3D NFSV4OP_ACCESS && op !=3D NFSV4OP_READLINK && op !=3D NFSV4OP_SECINFO && op !=3D NFSV4OP_SECINFONONAME) nd->nd_repstat =3D NFSERR_NOFILEHANDLE; This allows certain operations to be done on non-exported file systems and I think that is enough to allow this to work when mnt_exjail is not checked. (Note that NFSV4OP_LOOKUPP is not in the list, which might explain why it is the one that fails for Garrett. I don't think it can be added to this list safely, since that would allow a client to move above the exported file system into "uncharted territory".) > Just curious. Also, what is mnt_exjail > normally set to for file systems not in a jail? mnt_exjail is set to the credentials of the thread/process that exported the file system (usually mountd(8)). When not in a jail, cr_prison for these credentials points to prison0. Btw, I checked and the "other mp that has mnt_exjail =3D=3D NULL is in the mountlist, so the idea of checking "not in mountlist" is a dead end. I am looking for something "unique" about this other mp, but haven't found anything yet. Alternately, it might be necessary to add code to zfsctl_snapdir_lookup() to "cheat and change the mp" in more cases, such as "." and ".." lookups? rick ps: I added all the cc's back in because I want the ZFS folk to hopefully chime in. > > Mike > > > Do I need to search mountlist for it? > > > > rick > > ps: The hack patch attached should fix the problem, but can only be > > safely used if mountd/nfsd are not run in any jails. > > > >> > >> I will try and get a test setup going here, which leads me to.. > >> how do I create a ZFS snapshot? (I do have a simple ZFS pool running > >> on a test machine, but I've never done a snapshot.) > >> > >> Although this problem is not in 13.2, it will have shipped in 14.0. > >> > >> Any help with be appreciated, rick > >> > >>> > >>> Mike > >>>> > >>>>> rick > >>>>> > >>>>> On Fri, Nov 17, 2023 at 6:14=E2=80=AFPM Mike Karels wrote: > >>>>>> > >>>>>> CAUTION: This email originated from outside of the University of G= uelph. Do not click links or open attachments unless you recognize the send= er and know the content is safe. If in doubt, forward suspicious emails to = IThelp@uoguelph.ca. > >>>>>> > >>>>>> > >>>>>> Rick, have you been following this thread on freebsd-stable? I ha= ve been able > >>>>>> to reproduce this using a 13-stable server from Oct 7 and a 15-cur= rent system > >>>>>> that is up to date using NFSv3. I did not reproduce with a 13.2 s= erver. The > >>>>>> client was running 13.2. Any ideas? A full bisect seems fairly p= ainful, but > >>>>>> maybe you have an idea of points to try. Fortunately, these are a= ll test > >>>>>> systems that I can reboot at will. > >>>>>> > >>>>>> Mike > >>>>>> > >>>>>> Forwarded message: > >>>>>> > >>>>>>> From: Garrett Wollman > >>>>>>> To: Mike Karels > >>>>>>> Cc: freebsd-stable@freebsd.org > >>>>>>> Subject: Re: NFS exports of ZFS snapshots broken > >>>>>>> Date: Fri, 17 Nov 2023 17:35:04 -0500 > >>>>>>> > >>>>>>> < said: > >>>>>>> > >>>>>>>> I have not run into this, so I tried it just now. I had no prob= lem. > >>>>>>>> The server is 13.2, fully patched, the client is up-to-date -cur= rent, > >>>>>>>> and the mount is v4. > >>>>>>> > >>>>>>> On my 13.2 client and 13-stable server, I see: > >>>>>>> > >>>>>>> 25034 ls CALL open(0x237d32f9a000,0x120004) > >>>>>>> 25034 ls NAMI "/mnt/tools/.zfs/snapshot/weekly-2023-45" > >>>>>>> 25034 ls RET open 4 > >>>>>>> 25034 ls CALL fcntl(0x4,F_ISUNIONSTACK,0x0) > >>>>>>> 25034 ls RET fcntl 0 > >>>>>>> 25034 ls CALL getdirentries(0x4,0x237d32faa000,0x1000,0x2= 37d32fa7028) > >>>>>>> 25034 ls RET getdirentries -1 errno 5 Input/output error > >>>>>>> 25034 ls CALL close(0x4) > >>>>>>> 25034 ls RET close 0 > >>>>>>> 25034 ls CALL exit(0) > >>>>>>> > >>>>>>> Certainly a libc bug here that getdirentries(2) returning [EIO] > >>>>>>> results in ls(1) returning EXIT_SUCCESS, but the [EIO] error is > >>>>>>> consistent across both FreeBSD and Linux clients. > >>>>>>> > >>>>>>> Looking at this from the RPC side: > >>>>>>> > >>>>>>> (PUTFH, GETATTR, LOOKUP(snapshotname), GETFH, GETATTR) > >>>>>>> [NFS4_OK for all ops] > >>>>>>> (PUTFH, GETATTR) > >>>>>>> [NFS4_OK, NFS4_OK] > >>>>>>> (PUTFH, ACCESS(0x3f), GETATTR) > >>>>>>> [NFS4_OK, NFS4_OK, rights =3D 0x03, NFS4_OK] > >>>>>>> (PUTFH, GETATTR, LOOKUPP, GETFH, GETATTR) > >>>>>>> [NFS4_OK, NFS4_OK, NFS4ERR_NOFILEHANDLE] > >>>>>>> > >>>>>>> and at this point the [EIO] is returned. > >>>>>>> > >>>>>>> It seems that clients always do a LOOKUPP before calling READDIR,= and > >>>>>>> this is failing when the subject file handle is the snapshot. Th= e > >>>>>>> client is perfectly able to *traverse into* the snapshot: if I tr= y to > >>>>>>> list a subdirectory I know exists in the snapshot, the client is = able to > >>>>>>> LOOKUP(dirname) just fine, but LOOKUPP still fails with > >>>>>>> NFS4ERR_NOFILEHANDLE *on the subndirectory*. > >>>>>>> > >>>>>>> -GAWollman > >>>>>>