umount -f implementation
Rick Macklem
rmacklem at uoguelph.ca
Mon Jun 29 14:36:29 UTC 2009
On Mon, 29 Jun 2009, Attilio Rao wrote:
> 2009/6/29 Rick Macklem <rmacklem at uoguelph.ca>:
>> I just noticed that when I do the following:
>> - start a large write to an NFS mounted fs
>> - network partition the server (unplug a net cable)
>> - do a "umount -f <mntpoint>" on the machine
>>
>> that it gets stuck trying to write dirty blocks to the server.
>>
>> I had, in the past, assumed that a "umount -f" of an NFS mount would be
>> used to get rid of an NFS mount on an unresponsive server and that loss
>> of "writes in progress" would be expected to happen.
>>
>> Does that sound correct? (In other words, an I seeing a bug or a feature?)
>
> While that should be real in principle (immediate shutdown of the fs
> operation and unmounting of the partition) it is totally impossible to
> have it completely unsleeping, so it can happen that also umount -f
> sleeps / delays for some times (example: vflush).
> Currently, umount -f is one of the most complicated thing to handle in
> our VFS because it puts as requirement that vnodes can be reclaimed in
> any moment, adding complexity and possibility for races.
>
Yes, agreed. And I like to leave that stuff to more clever chaps than I:-)
> What's the fix for your problem?
>
Well, when I tested it I found that it got stuck in two places, both
calls to VFS_SYNC(). The first was a
sync();
right at the beginning of umount.c.
- All I did for that one is move it to after the code that handles
option processing and change it to
if ((fflag & MNT_FORCE) == 0)
sync();
so that it isn't done for the "-f" case. (I believe the sync(); call
at the beginning of umount is only a performance optimization, so I
don't think not doing it for "-f" should break anything.)
- the second happened just before the VFS_UNMOUNT() call in the
umount(2) system call. The code looks like:
if (((mp->mnt_flag & MNT_RDONLY) ||
(error = VFS_SYNC(mp, MNT_WAIT)) == 0) || (flags & MNT_FORCE) != 0)
- Although it was tempting to reverse the order of VFS_SYNC() and the
test for MNT_FORCE, I thought that might have a negative impact on
other file systems, since it avoided doing the VFS_SYNC(), so...
- Instead, I just put a check for MNTK_UNMOUNTF at the beginning of
nfs_sync(), so that it returns EBUSY for this case instead of getting
stuck trying to flush().
Assuming that I'm right w.r.t. the "sync();" at the beginning of umount.c,
it simply ensures that the umount command thread makes it as far as
VFS_UNMOUNT()->nfs_unmount(), so that the forced dismount proceeds. It
kills RPCs in progress before doing the vflush() and, since no new RPCs
can be done once MNTK_UNMOUNTF is set (it is checked at the beginning of
a request), the vflush() won't actually flush anything to the server.
As such, "umount -f" is pretty well guaranteed to throw away the dirty
buffers. I believe this is correct behaviour, but it would mean that a
user/sysadmin that uses "umount -f" for cases where the server is still
functioning, but slow, will lose data when they probably don't expect to.
Does this help? rick
ps: During simple testing, it has worked ok. It waits about 1 minute for
the RPC threads to shut down, but the "umount -f" does complete after
that happens. It the consensus seems to be that patching this is a
good idea, I'll get some more testing done.
More information about the freebsd-current
mailing list