Change default VFS timestamp precision?

Wed Dec 17 14:58:54 UTC 2014

On Wednesday, December 17, 2014 12:38:44 AM Jilles Tjoelker wrote:
> On Tue, Dec 16, 2014 at 01:48:41PM -0500, John Baldwin wrote:
> > We still ship with vfs.timestamp_precision=0 by default meaning that
> > VFS timestamps have a granularity of one second.  It is not unusual on
> > modern systems for multiple updates to a file or directory to occur
> > within a single second (and thus share the same effective timestamp).
> > This can break things that depend on timestamps to know when something
> > has changed or is stale (such as make(1) or NFS clients).  On hardware
> > that has a cheap timecounter, I we should use the most-precise
> > timestamps (vfs.timestamp_precision=3).  However, I'm less sure of
> > what to do for other cases such as i386/amd64 when not using TSC, or
> > on other platforms.  OTOH, perhaps you aren't doing lots of heavy I/O
> > access on a system with a slow timecounter (or if you are doing heavy
> > I/O, slow timecounter access won't be your bottleneck)?
> > 
> > I can think of a few options:
> >  1) Change vfs.timestamp_precision default to 3 for all systems.
> >  
> >  2) Only change vfs.timestamp_precision default to 3 for amd64/i386 using
> >  an
> >  
> >     #ifdef.
> >  
> >  3) Something else?
> 
> Although some breakage may be caused, increasing precision sounds fine
> to me, but only to the level of microseconds, since there is no way to
> set a timestamp to the nanosecond (this would be futimens/utimensat). It
> is easy to be surprised when cp -p creates an file that appears older
> than the original.

Note that vfs_timestamp() always returns a timespec, but 2 would do
microseconds.  The important difference for settings >= 2 is that it queries
the timecounter on each call rather than using a global value that is only
updated either once a second or once a millisecond or so.

> To avoid cross-arch surprises with applications that use
> second-resolution APIs, either all or no architectures should generate
> timestamps more accurate than seconds.

Actually, it will improve our interoperability with other OS's that already
use sub-second timestamps when sharing filesystems over NFS, for example.

> There is no benefit for the particular case of make(1), since it only
> uses timestamps in seconds.

My bad for not checking that further but for assuming make would be impacted.
The use case I _am_ familiar with is NFS servers and NFS v3 clients that 
depend on the mtime of a directory to know when the lookup cache for a 
directory can be invalidated.  Our NFS client now defaults to only trusting 
cached lookups for 60 seconds to workaround races due to seconds-granularity 
in timestamps from some NFS servers at the cost of reducing its effectiveness 
by a fair amount.  Note that Isilon already defaults vfs.timestamp_precision 
to 3 on their appliances, and I recently convinced the folks at TrueNAS to do 
the same.  However, it would also make stock FreeBSD NFS servers more reliable 
for NFS v3 if we changed our default.

-- 
John Baldwin