Race in NFS lookup can result in stale namecache entries

John Baldwin jhb at freebsd.org
Thu Jan 19 15:50:48 UTC 2012

On Thursday, January 19, 2012 9:06:13 am Kostik Belousov wrote:
> On Wed, Jan 18, 2012 at 05:07:21PM -0500, John Baldwin wrote:
> ...
> > What I concluded is that it would really be far simpler and more
> > obvious if the cached timestamps were stored in the namecache entry
> > directly rather than having multiple name cache entries validated by
> > shared state in the nfsnode.  This does mean allowing the name cache
> > to hold some filesystem-specific state.  However, I felt this was much
> > cleaner than adding a lot more complexity to nfs_lookup().  Also, this
> > turns out to be fairly non-invasive to implement since nfs_lookup()
> > calls cache_lookup() directly, but other filesystems only call it
> > indirectly via vfs_cache_lookup().  I considered letting filesystems
> > store a void * cookie in the name cache entry and having them provide
> > a destructor, etc.  However, that would require extra allocations for
> > NFS lookups.  Instead, I just adjusted the name cache API to
> > explicitly allow the filesystem to store a single timestamp in a name
> > cache entry by adding a new 'cache_enter_time()' that accepts a struct
> > timespec that is copied into the entry.  'cache_enter_time()' also
> > saves the current value of 'ticks' in the entry.  'cache_lookup()' is
> > modified to add two new arguments used to return the timespec and
> > ticks value used for a namecache entry when a hit in the cache occurs.
> > 
> > One wrinkle with this is that the name cache does not create actual
> > entries for ".", and thus it would not store any timestamps for those
> > lookups.  To fix this I changed the NFS client to explicitly fast-path
> > lookups of "." by always returning the current directory as setup by
> > cache_lookup() and never bothering to do a LOOKUP or check for stale
> > attributes in that case.
> > 
> > The current patch against 8 is at
> > http://www.FreeBSD.org/~jhb/patches/nfs_lookup.patch
> ...
> So now you add 8*2+4 bytes to each namecache entry on amd64 unconditionally.
> Current size of the struct namecache invariant part on amd64 is 72 bytes,
> so addition of 20 bytes looks slightly excessive. I am not sure about
> typical distribution of the namecache nc_name length, so it is unobvious
> does the change changes the memory usage significantly.
> A flag could be added to nc_flags to indicate the presence of timestamp.
> The timestamps would be conditionally placed after nc_nlen, we probably
> could use union to ease the access. Then, the direct dereferences of
> nc_name would need to be converted to some inline function.
> I can do this after your patch is committed, if you consider the memory
> usage saving worth it.

Hmm, if the memory usage really is worrying then I could move to using the
void * cookie method instead.

John Baldwin

More information about the freebsd-fs mailing list