NFSv4 stuck

Rick Macklem rmacklem at uoguelph.ca
Fri Jan 13 23:02:25 UTC 2017


Slawa Olhovchenkov wrote:
[stuff snipped]
>> >
>> >What data? In may case no data.
You have a file system with no files in it. (It is file data I am referring to.)
Admittedly a read-only file system won't get corrupted, but you will still have trouble
reading files, since NFSv4 require that they be Open'd before reading.
>> Certain NFSv4 operations (such as open and byte range locking) are strictly ordered using a
>> seqid#. If you fail an RPC in progress (via a soft timeout or intr via a signal) then this seqid gets
>> out of sync between client and server and your mount is badly broken.
>
>Mount can be droped? Automatic forced unmount?
>Or application can be manual killed for manual unmount?
>This is will be perfect for me. This is will be best that current behavior.
Well, since recently written data could be lost, I can't see this ever being automatic.
The manual "umount -f <mount-path>" should work, but only if a "umount <mount-path>" has
not already been done. (The latter gets stuck in the kernel, usually after locking the mounted-on
vnode and that blocks the subsequent "umount -f <mount-path>".

Someday, I plan on adding a new option to "umount" that goes directly to NFS (via the nfssvc(2)
syscall) to force a dismount, but I haven't gotten around to doing it.

Until then, it's "umount -f" or reboot. And please don't use "soft,intr" options, they won't usually
help and will break the mount for opening files sooner or later.
>
>> I do not believe this caused your hang though, since processes were sleeping on rpccon, which
>> means they were trying to do a new TCP connection to the server unsuccessfully.
>> - Which normally indicates a problem with your underlying network fabric.
>
>Network can fail always, at any time.
>This should not cause a blockage of the system.
Would you expect a local filesystem to keep working when the JBOD interface to a drive is broken.
For NFS, a broken network means "can't talk to the file system" just like a broken JBOD to a file
system's drive would mean this.

For NFS to work well, you want the most reliable network fabric possible.
One the network is fixed, it should again be possible for the mount to work.
(The processes in "rpccon" are trying to create a new TCP connection and when they succeed
 the mount point should again start working.)

rick


More information about the freebsd-net mailing list