NFS readdirplus on ZFS with > 1 billion files
Josh Paetzel
jpaetzel at FreeBSD.org
Sat Dec 31 18:08:48 UTC 2016
On Sat, Dec 31, 2016, at 07:33 AM, Konstantin Belousov wrote:
> On Sat, Dec 31, 2016 at 04:26:11AM -0600, Josh Paetzel wrote:
> > We've been chasing this bug for a very long time and finally managed to
> > pin it down. When a ZFS dataset has more than 1 billion files on it and
> > an NFS client does a readdirplus the file handles for files with high
> > znode/inode numbers gets truncated due to a 64 -> 32 bit conversion.
> >
> > https://reviews.freebsd.org/D9009
> >
> > This isn't a fix so much as a workaround. From a performance standpoint
> > it's the same as if the client mounts with noreaddirplus; sometimes it's
> > a win, sometimes it's a lose. CPU usage does go up on the server a bit.
> >
>
> Can you point to the places in ZFS code where the truncation occur ?
> I have no idea about ZFS code, and my question is mainly is the
> truncation
> just occurs due to different types of ino_t and zfs node id, or some code
> actively does the range reduction.
>
> My question is in the context of the long-dragging ino64 work, which
> might
> be finished in some visible future. In particular, I am curious if just
> using the patched kernel fixes your issue. See
> https://github.com/FreeBSDFoundation/freebsd/tree/ino64
> although I do not make any claim about the state of the code yet.
>
> Your patch, after a review, might be still useful for stable/10 and 11,
> since I do not think that ino64 has any bits which could be merged.
> _______________________________________________
> freebsd-fs at freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe at freebsd.org"
That's a great question and I will attempt to answer the best I can,
however I am cc'ing Ash Gokhale and Rick Macklem here because they
understand the issue better and might be able to provide a better
answer.
My understanding is the issue occurs here:
http://fxr.watson.org/fxr/source/fs/nfsserver/nfs_nfsdport.c?v=FREEBSD10#L2090
This codepath casts dirent d->fileno from 32 to 64bits to stuff the nfs
fileno, but the legacy struct dirent->d_fileno is still 32 bit.
I'm not entirely sure this is a ZFS specific issue at all, I've never
tried to put 1 billion files on a UFS filesystem to see what would
happen. (I suspect this issue with the NFS server would be the least of
your issues)
I agree the correct solution is the ino64 work. I'm fine with this hack
going directly in to 11-STABLE and 10-STABLE. (In fact I think that's
the best solution)
Another thing we kicked around was making this hack a sysctl, such that
you could manually activate it if a filesystem went over the threshold
for the bug to occur. No one is completely convinced we understand
fully the performance implications of this patch.
--
Thanks,
Josh Paetzel
More information about the freebsd-fs
mailing list