File remove problem
David Cecil
david.cecil at nokia.com
Thu Nov 29 21:26:32 PST 2007
ext Bruce Evans wrote:
> On Fri, 30 Nov 2007, David Cecil wrote:
>
>> Thanks Bruce.
>>
>> Actually, I had found the same problem, and I came up with the first
>> line of your patch (adding IN_MODIFIED) myself, but I still saw the
>> problem. I
>
> Yes, it's not that. Testing reminded me that there is normally a
> VOP_INACTIVE() after unlink so the IN_CHANGE mark doesn't live very long
> for unlink (it can only live long for open files).
>
> Testing shows that the problem is easy to reproduce and often partially
> detected before it becomes fatal. I saw something like the following:
>
> after touch a; ln a b; rm a; unmount -- no problem with 1 link
> remaining
> after touch a; rm a; unmount -- no problem with unmount
> after touch a; ln a b; rm a; mount -u o ro -- no problem with 1
> link...
> after touch a; ; rm a; mount -u o ro -- worked once without
> soft
> updates but seemed to be responsible for a soft update panic later
> after touch a; ; rm a; mount -u o ro -- usually fails with soft
> updates; the error is detected in various ways:
> under ~5.2, mount -u prints "/f: update error: blocks 0
> files 1"
> but succeeds
> under -current, mount -u fails and a subroutine prints
> "softdep_waitidle: Failed to flush worklist for 0xc3e1a29c"
> However, mount -u apparently cannot afford to fail at this
> poing since it has committed to succeeding -- further
> mount -u's and unmounts fail and it takes a reboot to reach
> an fsck that can fix the problem.
>
> mount -u seems to do some things right: at least under -current:
> - it calls ffs_sync() and thus ffs_update() with waitfor != 0.
Do you know it calls it for this vnode? I'm going to try and verify that.
> - IN_MODIFIED is usually already set in ffs_update().
> - softdep_update_inode_inodeblock() in ffs_update() seems to
> make null changes. That doesn't seem right -- shouldn't it
> update the link count and finish removing the file?... I
> just noticed that ufs_inactive() handles some of this.
> - it calls softdep_flushfiles() after doing the sync. This
> doesn't seem to touch the inode.
> - apparently, softdep_flushfiles() fails in -current, while in
> ~5.2 it bogusly succeeds and then code just after it is called
> detects a problem but doesn't handle it.
>
>> One more point to address Julian's question, the partition is not
>> mounted with soft updates.
>
> Interesting. I saw no sign of the problem without soft updates except a
> panic later after enabling soft updates. I was running fsck a lot but
> may have forgotten one since no error was detected. The problem should
> be easier to understand if it affects non-soft-updates.
It is not especially easy to reproduce. The only reliable mechanism I
have involves mounting rw, removing a file, and remount ro during the
boot cycle. I can only guess it's timing related and this increases the
chance of reproducing the problem.
More information about the freebsd-fs
mailing list