getdirentries cookies usage outside of UFS [PATCH]

Thu Jun 26 20:33:00 UTC 2014

On 2014-04-14 17:27, Kirk McKusick wrote:
>> Date: Fri, 11 Apr 2014 21:03:57 -0500
>> From: Bryan Drewery <bdrewery at freebsd.org>
>> To: freebsd-fs at freebsd.org
>> Subject: getdirentries cookies usage outside of UFS
>>
>> Recently I was working on a compat syscall for sys_getdirentries()
>> that
>> converts between our dirent and the FreeBSD dirent struct. We had
>> never
>> tried using this on TMPFS and when we did ran into weird issues (hence
>> my recent commits to TMPFS to clarify some of the getdirentries()
>> code).
>> We were not using cookies, so I referenced the Linux compat module
>> (linux_file.c getdents_common())
>>
>> I ran across this code:
>>
>>>     /*
>>>      * When using cookies, the vfs has the option of reading from
>>>      * a different offset than that supplied (UFS truncates the
>>>      * offset to a block boundary to make sure that it never reads
>>>      * partway through a directory entry, even if the directory
>>>      * has been compacted).
>>>      */
>>>     while (len > 0 && ncookies > 0 && *cookiep <= off) {
>>>             bdp = (struct dirent *) inp;
>>>             len -= bdp->d_reclen;
>>>             inp += bdp->d_reclen;
>>>             cookiep++;
>>>             ncookies--;
>>>     }=20
>>
>>
>> At first it looked innocuous but then it occurred to me it was the
>> root
>> of the issue I was having as it was eating my cookies based on their
>> value, despite tmpfs cookies being random hash values that have no
>> sequential relation. So I looked at how NFS was handling the same code
>> and found this lovely hack from r216691:
>>
>>> not_zfs = strcmp(vp->v_mount->mnt_vfc->vfc_name, "zfs");
>> =2E..
>>>  while (cpos < cend && ncookies > 0 &&
>>>      (dp->d_fileno == 0 || dp->d_type == DT_WHT ||
>>>       (not_zfs != 0 && ((u_quad_t)(*cookiep)) <= toff))) {
>>>          cpos += dp->d_reclen;
>>>          dp = (struct dirent *)cpos;
>>>          cookiep++;
>>>          ncookies--;
>>>  }
>>
>> I ended up doing the opposite, only running the code if getting
>> dirents
>> from "ufs".
>>
>> So there's multiple issue here.
>>
>> 1. NFS is broken on TMPFS. I can see why it's gone so long unnoticed,
>> why would you do that? Still probably worth fixing.
>>
>> 2. Linux and SVR4 getdirentries() are both broken on TMPFS/ZFS. I am
>> surprised Linux+ZFS has not been noticed by now. I am aware the SVR4
>> is
>> full of other bugs too. I ran across many just reviewing the
>> getdirentries code alone.
>>
>> Do any other file systems besides UFS do this offset/cookie
>> truncation/rewind? If UFS is the only one it may be acceptable to
>> change
>> this zfs check to !ufs and add it to the other modules. If we don't
>> like
>> that, or there are potentially other file systems doing this too, how
>> about adding a flag to somewhere to indicate the file system has
>> monotonically increasing offsets and needs this rewind support. I'm
>> not
>> sure where that is best done, struct vfsconf?
>>
>> Regards,
>> Bryan Drewery
>
> This code is specific to UFS. I concur with your fix of making
> it conditionl on UFS. I feel guilty for putting that code in
> unconditionally in the first place. In my defense it was 1982
> and UFS was the only filesystem :-)
>
> 	Kirk McKusick

Based on this discussion I have made the following patch against NFS. If 
we're comfortable with this approach I will apply the same logic to the 
Linux and SVR4 modules.

Mirrored at http://people.freebsd.org/~bdrewery/patches/nfs-zfs-ufs.patch

The patch changes 'not_zfs' to 'is_ufs' in the NFS code. Some of the 
code actually is ZFS-specific in regards to snapshot handling. So I have 
also added a 'is_zfs' variable to compare against for those cases.

I've removed the comments about ZFS in the UFS-only cases as the 
existing comment seems to cover it fine.

(Unrelated) This code, from r259845, in sys/fs/nfsserver/nfs_nfsdport.c 
seems odd to me:

         /*
          * Check to see if entries in this directory can be safely acquired
          * via VFS_VGET() or if a switch to VOP_LOOKUP() is required.
          * ZFS snapshot directories need VOP_LOOKUP(), so that any
          * automount of the snapshot directory that is required will
          * be done.
          * This needs to be done here for NFSv4, since NFSv4 never does
          * a VFS_VGET() for "." or "..".
          */
-       if (not_zfs == 0) {
+       if (is_zfs == 1) {
                 r = VFS_VGET(mp, at.na_fileid, LK_SHARED, &nvp);
                 if (r == EOPNOTSUPP) {
                         usevget = 0;
                         cn.cn_nameiop = LOOKUP;
                         cn.cn_lkflags = LK_SHARED | LK_RETRY;
                         cn.cn_cred = nd->nd_cred;
                         cn.cn_thread = p;
                 } else if (r == 0)
                         vput(nvp);
         }

This fallback is also done later (from r199715), but not limited to 
ZFS. Would it make sense to not limit this first check to ZFS as well? I 
see that unionfs_vget also returns EOPNOTSUPP. A nullfs mount from ZFS 
served over NFS may also return EOPNOTSUPP, as odd as that is.


-- 
Regards,
Bryan Drewery
-------------- next part --------------

diff --git sys/fs/nfsserver/nfs_nfsdport.c sys/fs/nfsserver/nfs_nfsdport.c
index 0979864..c798e5d 100644
--- sys/fs/nfsserver/nfs_nfsdport.c
+++ sys/fs/nfsserver/nfs_nfsdport.c
@@ -1551,7 +1551,7 @@ nfsrvd_readdir(struct nfsrv_descript *nd, int isdgram,
 	u_long *cookies = NULL, *cookiep;
 	struct uio io;
 	struct iovec iv;
-	int not_zfs;
+	int is_ufs;
 
 	if (nd->nd_repstat) {
 		nfsrv_postopattr(nd, getret, &at);
@@ -1606,7 +1606,7 @@ nfsrvd_readdir(struct nfsrv_descript *nd, int isdgram,
 			nfsrv_postopattr(nd, getret, &at);
 		goto out;
 	}
-	not_zfs = strcmp(vp->v_mount->mnt_vfc->vfc_name, "zfs");
+	is_ufs = strcmp(vp->v_mount->mnt_vfc->vfc_name, "ufs") == 0;
 	MALLOC(rbuf, caddr_t, siz, M_TEMP, M_WAITOK);
 again:
 	eofflag = 0;
@@ -1686,12 +1686,10 @@ again:
 	 * skip over the records that precede the requested offset. This
 	 * requires the assumption that file offset cookies monotonically
 	 * increase.
-	 * Since the offset cookies don't monotonically increase for ZFS,
-	 * this is not done when ZFS is the file system.
 	 */
 	while (cpos < cend && ncookies > 0 &&
 	    (dp->d_fileno == 0 || dp->d_type == DT_WHT ||
-	     (not_zfs != 0 && ((u_quad_t)(*cookiep)) <= toff))) {
+	     (is_ufs == 1 && ((u_quad_t)(*cookiep)) <= toff))) {
 		cpos += dp->d_reclen;
 		dp = (struct dirent *)cpos;
 		cookiep++;
@@ -1804,7 +1802,7 @@ nfsrvd_readdirplus(struct nfsrv_descript *nd, int isdgram,
 	struct uio io;
 	struct iovec iv;
 	struct componentname cn;
-	int at_root, needs_unbusy, not_zfs, supports_nfsv4acls;
+	int at_root, is_ufs, is_zfs, needs_unbusy, supports_nfsv4acls;
 	struct mount *mp, *new_mp;
 	uint64_t mounted_on_fileno;
 
@@ -1884,7 +1882,8 @@ nfsrvd_readdirplus(struct nfsrv_descript *nd, int isdgram,
 			nfsrv_postopattr(nd, getret, &at);
 		goto out;
 	}
-	not_zfs = strcmp(vp->v_mount->mnt_vfc->vfc_name, "zfs");
+	is_ufs = strcmp(vp->v_mount->mnt_vfc->vfc_name, "ufs") == 0;
+	is_zfs = strcmp(vp->v_mount->mnt_vfc->vfc_name, "zfs") == 0;
 
 	MALLOC(rbuf, caddr_t, siz, M_TEMP, M_WAITOK);
 again:
@@ -1957,12 +1956,10 @@ again:
 	 * skip over the records that precede the requested offset. This
 	 * requires the assumption that file offset cookies monotonically
 	 * increase.
-	 * Since the offset cookies don't monotonically increase for ZFS,
-	 * this is not done when ZFS is the file system.
 	 */
 	while (cpos < cend && ncookies > 0 &&
 	  (dp->d_fileno == 0 || dp->d_type == DT_WHT ||
-	   (not_zfs != 0 && ((u_quad_t)(*cookiep)) <= toff) ||
+	   (is_ufs == 1 && ((u_quad_t)(*cookiep)) <= toff) ||
 	   ((nd->nd_flag & ND_NFSV4) &&
 	    ((dp->d_namlen == 1 && dp->d_name[0] == '.') ||
 	     (dp->d_namlen==2 && dp->d_name[0]=='.' && dp->d_name[1]=='.'))))) {
@@ -2004,7 +2001,7 @@ again:
 	 * This needs to be done here for NFSv4, since NFSv4 never does
 	 * a VFS_VGET() for "." or "..".
 	 */
-	if (not_zfs == 0) {
+	if (is_zfs == 1) {
 		r = VFS_VGET(mp, at.na_fileid, LK_SHARED, &nvp);
 		if (r == EOPNOTSUPP) {
 			usevget = 0;
@@ -2153,7 +2150,7 @@ again:
 					if (!r)
 					    r = nfsvno_getattr(nvp, nvap,
 						nd->nd_cred, p, 1);
-					if (r == 0 && not_zfs == 0 &&
+					if (r == 0 && is_zfs == 1 &&
 					    nfsrv_enable_crossmntpt != 0 &&
 					    (nd->nd_flag & ND_NFSV4) != 0 &&
 					    nvp->v_type == VDIR &&
diff --git sys/nfsserver/nfs_serv.c sys/nfsserver/nfs_serv.c
index de58421..1010da6 100644
--- sys/nfsserver/nfs_serv.c
+++ sys/nfsserver/nfs_serv.c
@@ -2627,7 +2627,7 @@ nfsrv_readdir(struct nfsrv_descript *nfsd, struct nfssvc_sock *slp,
 	int v3 = (nfsd->nd_flag & ND_NFSV3);
 	u_quad_t off, toff, verf;
 	u_long *cookies = NULL, *cookiep; /* needs to be int64_t or off_t */
-	int not_zfs;
+	int is_ufs;
 
 	nfsdbprintf(("%s %d\n", __FILE__, __LINE__));
 	fhp = &nfh.fh_generic;
@@ -2690,7 +2690,7 @@ nfsrv_readdir(struct nfsrv_descript *nfsd, struct nfssvc_sock *slp,
 		error = 0;
 		goto nfsmout;
 	}
-	not_zfs = strcmp(vp->v_mount->mnt_vfc->vfc_name, "zfs") != 0;
+	is_ufs = strcmp(vp->v_mount->mnt_vfc->vfc_name, "ufs") == 0;
 	VOP_UNLOCK(vp, 0);
 
 	/*
@@ -2777,12 +2777,10 @@ again:
 	 * skip over the records that precede the requested offset. This
 	 * requires the assumption that file offset cookies monotonically
 	 * increase.
-	 * Since the offset cookies don't monotonically increase for ZFS,
-	 * this is not done when ZFS is the file system.
 	 */
 	while (cpos < cend && ncookies > 0 &&
 		(dp->d_fileno == 0 || dp->d_type == DT_WHT ||
-		 (not_zfs != 0 && ((u_quad_t)(*cookiep)) <= toff))) {
+		 (is_ufs == 1 && ((u_quad_t)(*cookiep)) <= toff))) {
 		cpos += dp->d_reclen;
 		dp = (struct dirent *)cpos;
 		cookiep++;
@@ -2928,7 +2926,7 @@ nfsrv_readdirplus(struct nfsrv_descript *nfsd, struct nfssvc_sock *slp,
 	int usevget = 1;
 	struct componentname cn;
 	struct mount *mntp = NULL;
-	int not_zfs;
+	int is_ufs;
 
 	nfsdbprintf(("%s %d\n", __FILE__, __LINE__));
 	vp_locked = 0;
@@ -2988,7 +2986,7 @@ nfsrv_readdirplus(struct nfsrv_descript *nfsd, struct nfssvc_sock *slp,
 		error = 0;
 		goto nfsmout;
 	}
-	not_zfs = strcmp(vp->v_mount->mnt_vfc->vfc_name, "zfs") != 0;
+	is_ufs = strcmp(vp->v_mount->mnt_vfc->vfc_name, "ufs") == 0;
 	VOP_UNLOCK(vp, 0);
 	vp_locked = 0;
 	rbuf = malloc(siz, M_TEMP, M_WAITOK);
@@ -3068,12 +3066,10 @@ again:
 	 * skip over the records that precede the requested offset. This
 	 * requires the assumption that file offset cookies monotonically
 	 * increase.
-	 * Since the offset cookies don't monotonically increase for ZFS,
-	 * this is not done when ZFS is the file system.
 	 */
 	while (cpos < cend && ncookies > 0 &&
 		(dp->d_fileno == 0 || dp->d_type == DT_WHT ||
-		 (not_zfs != 0 && ((u_quad_t)(*cookiep)) <= toff))) {
+		 (is_ufs == 1 && ((u_quad_t)(*cookiep)) <= toff))) {
 		cpos += dp->d_reclen;
 		dp = (struct dirent *)cpos;
 		cookiep++;