getdirentries cookies usage outside of UFS [PATCH]
Bryan Drewery
bdrewery at FreeBSD.org
Thu Jun 26 20:33:00 UTC 2014
On 2014-04-14 17:27, Kirk McKusick wrote:
>> Date: Fri, 11 Apr 2014 21:03:57 -0500
>> From: Bryan Drewery <bdrewery at freebsd.org>
>> To: freebsd-fs at freebsd.org
>> Subject: getdirentries cookies usage outside of UFS
>>
>> Recently I was working on a compat syscall for sys_getdirentries()
>> that
>> converts between our dirent and the FreeBSD dirent struct. We had
>> never
>> tried using this on TMPFS and when we did ran into weird issues (hence
>> my recent commits to TMPFS to clarify some of the getdirentries()
>> code).
>> We were not using cookies, so I referenced the Linux compat module
>> (linux_file.c getdents_common())
>>
>> I ran across this code:
>>
>>> /*
>>> * When using cookies, the vfs has the option of reading from
>>> * a different offset than that supplied (UFS truncates the
>>> * offset to a block boundary to make sure that it never reads
>>> * partway through a directory entry, even if the directory
>>> * has been compacted).
>>> */
>>> while (len > 0 && ncookies > 0 && *cookiep <= off) {
>>> bdp = (struct dirent *) inp;
>>> len -= bdp->d_reclen;
>>> inp += bdp->d_reclen;
>>> cookiep++;
>>> ncookies--;
>>> }=20
>>
>>
>> At first it looked innocuous but then it occurred to me it was the
>> root
>> of the issue I was having as it was eating my cookies based on their
>> value, despite tmpfs cookies being random hash values that have no
>> sequential relation. So I looked at how NFS was handling the same code
>> and found this lovely hack from r216691:
>>
>>> not_zfs = strcmp(vp->v_mount->mnt_vfc->vfc_name, "zfs");
>> =2E..
>>> while (cpos < cend && ncookies > 0 &&
>>> (dp->d_fileno == 0 || dp->d_type == DT_WHT ||
>>> (not_zfs != 0 && ((u_quad_t)(*cookiep)) <= toff))) {
>>> cpos += dp->d_reclen;
>>> dp = (struct dirent *)cpos;
>>> cookiep++;
>>> ncookies--;
>>> }
>>
>> I ended up doing the opposite, only running the code if getting
>> dirents
>> from "ufs".
>>
>> So there's multiple issue here.
>>
>> 1. NFS is broken on TMPFS. I can see why it's gone so long unnoticed,
>> why would you do that? Still probably worth fixing.
>>
>> 2. Linux and SVR4 getdirentries() are both broken on TMPFS/ZFS. I am
>> surprised Linux+ZFS has not been noticed by now. I am aware the SVR4
>> is
>> full of other bugs too. I ran across many just reviewing the
>> getdirentries code alone.
>>
>> Do any other file systems besides UFS do this offset/cookie
>> truncation/rewind? If UFS is the only one it may be acceptable to
>> change
>> this zfs check to !ufs and add it to the other modules. If we don't
>> like
>> that, or there are potentially other file systems doing this too, how
>> about adding a flag to somewhere to indicate the file system has
>> monotonically increasing offsets and needs this rewind support. I'm
>> not
>> sure where that is best done, struct vfsconf?
>>
>> Regards,
>> Bryan Drewery
>
> This code is specific to UFS. I concur with your fix of making
> it conditionl on UFS. I feel guilty for putting that code in
> unconditionally in the first place. In my defense it was 1982
> and UFS was the only filesystem :-)
>
> Kirk McKusick
Based on this discussion I have made the following patch against NFS. If
we're comfortable with this approach I will apply the same logic to the
Linux and SVR4 modules.
Mirrored at http://people.freebsd.org/~bdrewery/patches/nfs-zfs-ufs.patch
The patch changes 'not_zfs' to 'is_ufs' in the NFS code. Some of the
code actually is ZFS-specific in regards to snapshot handling. So I have
also added a 'is_zfs' variable to compare against for those cases.
I've removed the comments about ZFS in the UFS-only cases as the
existing comment seems to cover it fine.
(Unrelated) This code, from r259845, in sys/fs/nfsserver/nfs_nfsdport.c
seems odd to me:
/*
* Check to see if entries in this directory can be safely acquired
* via VFS_VGET() or if a switch to VOP_LOOKUP() is required.
* ZFS snapshot directories need VOP_LOOKUP(), so that any
* automount of the snapshot directory that is required will
* be done.
* This needs to be done here for NFSv4, since NFSv4 never does
* a VFS_VGET() for "." or "..".
*/
- if (not_zfs == 0) {
+ if (is_zfs == 1) {
r = VFS_VGET(mp, at.na_fileid, LK_SHARED, &nvp);
if (r == EOPNOTSUPP) {
usevget = 0;
cn.cn_nameiop = LOOKUP;
cn.cn_lkflags = LK_SHARED | LK_RETRY;
cn.cn_cred = nd->nd_cred;
cn.cn_thread = p;
} else if (r == 0)
vput(nvp);
}
This fallback is also done later (from r199715), but not limited to
ZFS. Would it make sense to not limit this first check to ZFS as well? I
see that unionfs_vget also returns EOPNOTSUPP. A nullfs mount from ZFS
served over NFS may also return EOPNOTSUPP, as odd as that is.
--
Regards,
Bryan Drewery
-------------- next part --------------
diff --git sys/fs/nfsserver/nfs_nfsdport.c sys/fs/nfsserver/nfs_nfsdport.c
index 0979864..c798e5d 100644
--- sys/fs/nfsserver/nfs_nfsdport.c
+++ sys/fs/nfsserver/nfs_nfsdport.c
@@ -1551,7 +1551,7 @@ nfsrvd_readdir(struct nfsrv_descript *nd, int isdgram,
u_long *cookies = NULL, *cookiep;
struct uio io;
struct iovec iv;
- int not_zfs;
+ int is_ufs;
if (nd->nd_repstat) {
nfsrv_postopattr(nd, getret, &at);
@@ -1606,7 +1606,7 @@ nfsrvd_readdir(struct nfsrv_descript *nd, int isdgram,
nfsrv_postopattr(nd, getret, &at);
goto out;
}
- not_zfs = strcmp(vp->v_mount->mnt_vfc->vfc_name, "zfs");
+ is_ufs = strcmp(vp->v_mount->mnt_vfc->vfc_name, "ufs") == 0;
MALLOC(rbuf, caddr_t, siz, M_TEMP, M_WAITOK);
again:
eofflag = 0;
@@ -1686,12 +1686,10 @@ again:
* skip over the records that precede the requested offset. This
* requires the assumption that file offset cookies monotonically
* increase.
- * Since the offset cookies don't monotonically increase for ZFS,
- * this is not done when ZFS is the file system.
*/
while (cpos < cend && ncookies > 0 &&
(dp->d_fileno == 0 || dp->d_type == DT_WHT ||
- (not_zfs != 0 && ((u_quad_t)(*cookiep)) <= toff))) {
+ (is_ufs == 1 && ((u_quad_t)(*cookiep)) <= toff))) {
cpos += dp->d_reclen;
dp = (struct dirent *)cpos;
cookiep++;
@@ -1804,7 +1802,7 @@ nfsrvd_readdirplus(struct nfsrv_descript *nd, int isdgram,
struct uio io;
struct iovec iv;
struct componentname cn;
- int at_root, needs_unbusy, not_zfs, supports_nfsv4acls;
+ int at_root, is_ufs, is_zfs, needs_unbusy, supports_nfsv4acls;
struct mount *mp, *new_mp;
uint64_t mounted_on_fileno;
@@ -1884,7 +1882,8 @@ nfsrvd_readdirplus(struct nfsrv_descript *nd, int isdgram,
nfsrv_postopattr(nd, getret, &at);
goto out;
}
- not_zfs = strcmp(vp->v_mount->mnt_vfc->vfc_name, "zfs");
+ is_ufs = strcmp(vp->v_mount->mnt_vfc->vfc_name, "ufs") == 0;
+ is_zfs = strcmp(vp->v_mount->mnt_vfc->vfc_name, "zfs") == 0;
MALLOC(rbuf, caddr_t, siz, M_TEMP, M_WAITOK);
again:
@@ -1957,12 +1956,10 @@ again:
* skip over the records that precede the requested offset. This
* requires the assumption that file offset cookies monotonically
* increase.
- * Since the offset cookies don't monotonically increase for ZFS,
- * this is not done when ZFS is the file system.
*/
while (cpos < cend && ncookies > 0 &&
(dp->d_fileno == 0 || dp->d_type == DT_WHT ||
- (not_zfs != 0 && ((u_quad_t)(*cookiep)) <= toff) ||
+ (is_ufs == 1 && ((u_quad_t)(*cookiep)) <= toff) ||
((nd->nd_flag & ND_NFSV4) &&
((dp->d_namlen == 1 && dp->d_name[0] == '.') ||
(dp->d_namlen==2 && dp->d_name[0]=='.' && dp->d_name[1]=='.'))))) {
@@ -2004,7 +2001,7 @@ again:
* This needs to be done here for NFSv4, since NFSv4 never does
* a VFS_VGET() for "." or "..".
*/
- if (not_zfs == 0) {
+ if (is_zfs == 1) {
r = VFS_VGET(mp, at.na_fileid, LK_SHARED, &nvp);
if (r == EOPNOTSUPP) {
usevget = 0;
@@ -2153,7 +2150,7 @@ again:
if (!r)
r = nfsvno_getattr(nvp, nvap,
nd->nd_cred, p, 1);
- if (r == 0 && not_zfs == 0 &&
+ if (r == 0 && is_zfs == 1 &&
nfsrv_enable_crossmntpt != 0 &&
(nd->nd_flag & ND_NFSV4) != 0 &&
nvp->v_type == VDIR &&
diff --git sys/nfsserver/nfs_serv.c sys/nfsserver/nfs_serv.c
index de58421..1010da6 100644
--- sys/nfsserver/nfs_serv.c
+++ sys/nfsserver/nfs_serv.c
@@ -2627,7 +2627,7 @@ nfsrv_readdir(struct nfsrv_descript *nfsd, struct nfssvc_sock *slp,
int v3 = (nfsd->nd_flag & ND_NFSV3);
u_quad_t off, toff, verf;
u_long *cookies = NULL, *cookiep; /* needs to be int64_t or off_t */
- int not_zfs;
+ int is_ufs;
nfsdbprintf(("%s %d\n", __FILE__, __LINE__));
fhp = &nfh.fh_generic;
@@ -2690,7 +2690,7 @@ nfsrv_readdir(struct nfsrv_descript *nfsd, struct nfssvc_sock *slp,
error = 0;
goto nfsmout;
}
- not_zfs = strcmp(vp->v_mount->mnt_vfc->vfc_name, "zfs") != 0;
+ is_ufs = strcmp(vp->v_mount->mnt_vfc->vfc_name, "ufs") == 0;
VOP_UNLOCK(vp, 0);
/*
@@ -2777,12 +2777,10 @@ again:
* skip over the records that precede the requested offset. This
* requires the assumption that file offset cookies monotonically
* increase.
- * Since the offset cookies don't monotonically increase for ZFS,
- * this is not done when ZFS is the file system.
*/
while (cpos < cend && ncookies > 0 &&
(dp->d_fileno == 0 || dp->d_type == DT_WHT ||
- (not_zfs != 0 && ((u_quad_t)(*cookiep)) <= toff))) {
+ (is_ufs == 1 && ((u_quad_t)(*cookiep)) <= toff))) {
cpos += dp->d_reclen;
dp = (struct dirent *)cpos;
cookiep++;
@@ -2928,7 +2926,7 @@ nfsrv_readdirplus(struct nfsrv_descript *nfsd, struct nfssvc_sock *slp,
int usevget = 1;
struct componentname cn;
struct mount *mntp = NULL;
- int not_zfs;
+ int is_ufs;
nfsdbprintf(("%s %d\n", __FILE__, __LINE__));
vp_locked = 0;
@@ -2988,7 +2986,7 @@ nfsrv_readdirplus(struct nfsrv_descript *nfsd, struct nfssvc_sock *slp,
error = 0;
goto nfsmout;
}
- not_zfs = strcmp(vp->v_mount->mnt_vfc->vfc_name, "zfs") != 0;
+ is_ufs = strcmp(vp->v_mount->mnt_vfc->vfc_name, "ufs") == 0;
VOP_UNLOCK(vp, 0);
vp_locked = 0;
rbuf = malloc(siz, M_TEMP, M_WAITOK);
@@ -3068,12 +3066,10 @@ again:
* skip over the records that precede the requested offset. This
* requires the assumption that file offset cookies monotonically
* increase.
- * Since the offset cookies don't monotonically increase for ZFS,
- * this is not done when ZFS is the file system.
*/
while (cpos < cend && ncookies > 0 &&
(dp->d_fileno == 0 || dp->d_type == DT_WHT ||
- (not_zfs != 0 && ((u_quad_t)(*cookiep)) <= toff))) {
+ (is_ufs == 1 && ((u_quad_t)(*cookiep)) <= toff))) {
cpos += dp->d_reclen;
dp = (struct dirent *)cpos;
cookiep++;
More information about the freebsd-fs
mailing list