File remove problem

Sun Dec 2 21:17:51 PST 2007

On  3 Dec, Bruce Evans wrote:
> On Sun, 2 Dec 2007, Don Lewis wrote:
> 
>> On  2 Dec, Bruce Evans wrote:
>>
>>> So it should be safe to remove all the r/o checks in ufs_inactive() after
>>> fixing the other bugs.  ffs_truncate alread panics if fs_ronly, but only
>>> in some cases.  In particular, it doesn't panic for truncations that don't
>>> change the file size.  Such truncations aren't quite null, since standards
>>> require [f]truncate(2) to mark the ctime and mtime for update.
>>> ffs_truncate() sets the marks, which is correct for null truncations from
>>> userland but not ones from syncer internals.  Setting of the marks when
>>> fs_ronly is set should cause panics later (my patch has a vprint() for it).
>>
>> I think the MNT_RDONLY check in ufs_itimes_locked() should be also be
>> changed to look at fs_ronly and panic if any marks are set.  This will
>> require some changes to add some early MNT_RDONLY checks.
> 
> Yes, already done (except vprint() instead of panic).
> 
>> In particular, ffs_read() and ffs_extread() need to check MNT_RDONLY in
>> addition to MNT_NOATIME (as is already done in vfs_mark_atime()).  This
>> also looks like it should be a reasonable optimization for read-only
>> file systems that should eliminate unnecessary work at the lower levels
>> of the code.
> 
> But I let these happen and discard IN_ATIME marks if fs_ronly.  I
> thought that the optimization went the other way -- unconditionally
> setting the marks was very efficient, and discarding them in ufs_itimes()
> was efficient too.  I think this is still true now with larger locking
> overheads, and the marks should be discarded later in the MNT_NOATIME
> case too.  It is expected that the marks are set much more often than
> they are looked at by ufs_itimes(), since most calls to ufs_itimes()
> are in close() and read() is much more common than close().

ffs_read() and ffs_extread() already check MNT_NOATIME, so also checking
MNT_RDONLY there as well is free.  Setting and clearing the mark will
consume a few instruction cycles, dirty a cache line, and increase main
memory write-back traffic, though the expense is likely to be small.

Preventing user reads from setting IN_ATIME as soon as MNT_RDONLY is set
on a downgrade to read-only seems to be the right thing to do.

> ufs_itimes()
> is also called in stat() but I think that is less common than close()
> (except for some tree walks).  WIth non-delayed marking, ufs_itimes()
> would still have to check fs_ronly, and the only gain would be that
> it could then skip checking the marks except as an invariants check.
> But it can gain like that even with delayed setting -- just ignore any
> old marks while fs_ronly (except as an invariants check), but clear them
> at mount or unmount time so that there shouldn't be any.

I think that setting the marks when the file system is read-only causes
the syncer to do extra work.  I think that ffs_sync() still gets called
if the file system is read-only, and if it encounters any inodes with
marks set, it calls ffs_syncvnode() on them.

>> The early IN_ACCESS flag setting in ufs_setattr(), before the MNT_RDONLY
>> check, appears to be protected by the MNT_RDONLY check in
>> vfs_mark_atime().
> 
> Thanks, I had forgotten about that.  In vfs_mark_atime(), there is much
> more efficiency to be gained by not setting marks that will be discarded,
> since it takes a VOP to set them and many file systems don't support
> this setting.  However, it is hard for vfs_mark_atime() to know when the
> mark will be discarded without calling the fs:
> 
> - it already doesn't know which fs's support it
> - it should be checking fs_ronly for ffs

I think that MNT_RDONLY is correct here.  We want to stop new atime
updates as soon as the downgrade starts, just like we stop new
user-initiated writes.

> - it seems to be missing locking for MNT_NOATIME and MNT_RDONLY
> 
> fs-level locking for MNT_NOATIME and MNT_RDONLY and fs_ronly is dubious
> too.  Upper layers set the MNT flags before giving VOP_MOUNT() a chance
> to adjust the marks.  This is automatically safe in one direction only
> (e.g., setting MNT_NOATIME or MNT_RDONLY is fail-safe since it stops
> changes), and always bad for strict invariants.

Maybe a reasonable way to handle this would be to set the
flags before calling VOP_MOUNT() when they are being changed from 0 to
1, and clear them after calling VOP_MOUNT() when changing them from 1 to
0. Adding explicit locking sounds painful ...

> I now use the following fixes:

> % Index: ufs_vnops.c
> % ===================================================================
> % RCS file: /home/ncvs/src/sys/ufs/ufs/ufs_vnops.c,v
> % retrieving revision 1.293
> % diff -u -2 -r1.293 ufs_vnops.c
> % --- ufs_vnops.c	8 Nov 2007 17:21:51 -0000	1.293
> % +++ ufs_vnops.c	2 Dec 2007 04:56:58 -0000
> % @@ -89,4 +89,5 @@
> %  #endif
> % 
> % +#include <ufs/ffs/fs.h>
> %  #include <ufs/ffs/ffs_extern.h>
> % 
> % @@ -137,8 +138,38 @@
> % 
> %  	ip = VTOI(vp);
> % +	/*
> % +	 * MNT_RDONLY can barely be trusted here.  Full r/o mode is indicated
> % +	 * by fs_ronly, and the MNT_RDONLY setting [should] differ from the
> % +	 * fs_ronly setting only during transition from r/w mode to r/o mode.
> % +	 * We set IN_ACCESS even in full r/o mode, so we must discard it
> % +	 * unconditionally here.  During the transition, we must convert
> % +	 * IN_CHANGE | IN_UPDATE to IN_MODIFIED, since some callers neglect
> % +	 * to set IN_MODIFIED.  We also set the timestamps indicated by
> % +	 * IN_CHANGE | IN_UPDATE normally during the transition, since the
> % +	 * update marks may have been set correctly before the transition and
> % +	 * not yet converted into timestamps.  Callers that set IN_CHANGE |
> % +	 * IN_UPDATE during the transition are buggy, since userland writes
> % +	 * are supposed to be denied (by MNT_RDONLY checks) during the
> % +	 * transition, while kernel writes should should only be for syncs
> % +	 * and syncs should not touch timestamps except to convert old
> % +	 * update marks to timestamps.  Callers that set any update mark or
> % +	 * modification flag except IN_ACCESS while in full r/o mode are
> % +	 * broken; we will panic for them later.
> % +	 */
> %  	if ((vp->v_mount->mnt_flag & MNT_RDONLY) != 0)
> % -		goto out;
> % +		ip->i_flag &= ~IN_ACCESS;

IN_ACCESS might have been set before the downgrade request.  As written,
this change will toss out the timestamp update.  I think it would be
better to use fs_ronly here, but it would be more efficient to check
MNT_RDONLY in ffs_*read() and eliminate these two lines of code.