Gnome and Firefox, lockd and NFS

Rick Macklem rmacklem at uoguelph.ca
Sat Sep 15 14:28:11 UTC 2018


Daniel Feenberg <feenberg at nber.org> wrote:
> We are using the NFS server on FreeBSD 10.3-RELEASE-p7 to serve home
> directories to Linux SL7 (same as Centos7 and RH7) clients via NFSv3.
> While this worked fine with SL6, we find that starting Firefox, Gnome or
> Mate causes the client to hang with the message:
>
>     nfs server XXX not responding, still trying
>
> and (for example)
>
>     ~/.mozilla/firefox/XXX/.nfs00000000000XXXXXXXXX file stuck
>
> If one tries to remove the lock, the client responds:
>
>     Cannot remove, Device or resource busy
>
> Furthermore, the server quickly stops serving *all* other clients also. If
> we kill the client process, then the server recovers after a few minutes.
Since the server recovers, it implies that it was not crashed, but being hit
with so many RPCs that it appeared "dead", I suspect.

NFSv3 doesn't have file locking in it. The locking is done by separate protocols
called the NLM and NSM (rpc.lockd and rpc.statd are what the implementation
daemons are called).
The NLM protocol is a fundamentally flawed design (that was never published
as a specification by Sun). As such, I refused to ever implement it. Others
eventually did implement it, but I still recommend against using it, due to flaws
it the protocol design (which can't be mitigated by the implementation).

There are two ways to do file locking without using the NLM and NSM.
1 - If the locks do not need to be visible to other clients (this will be the
      case unless it is a true distributed application accessing the same files
      on multiple clients concurrently), just add the "nolockd" mount option
      to the client mounts (I think it is called "nolock" on the Linux clients).
OR
2 - Switch to using NFSv4 mounts, since NFSv4 does have integrated file
      locking that works much better. (I'd suggest NFSv4.1 over NFSv4.0,
      since NFSv4.1 fixed a lot of things. Some Linux distros don't have NFSV4.1
      enabled in their kernel by default and you have to rebuild the kernel
      from sources to enable it, unfortunately. Relatively recent Linux kernels
      do NFSv4.1 fine, from my limited experience with them.

> Our workaround has been to move the directories .local, .config and .dbus
> in the home directory to an NFS partition that is mounting without
> locking, but this seems inadequate as a permanent solution.
>
> Since the FreeBSD server stops responding to other clients, it seems it
> must be a FreeBSD problem.
I think you have a "livelock" type problem, where the client is flooding the
server with RPCs.
To check, you could capture packets when it happens and look at them with
wireshark, which understands NFS RPCs well.

> Even if the Linux client (systemd?) is making
> an improper request, it is inappropriate for FreeBSD to hang in response.
> We also see this same result with Truenas and FreeNAS fileservers (which
> are based on FreeBSD) but see https://redmine.ixsystems.com/issues/927 for
> another report related to earlier clients. A Linux NFS server does not
> display this problem.
Part of the problem with the NLM is that no two implementations will be 100%
compatible, since there was never any published spec (like an RFC) for the
protocol. Each implementation tries various tricks to make it work better.
I suspect some change between the implementations in SL6 and SL7 triggered
this. (Since the protocol is unpublished and fundamentally flawed, I don't try
and fix it and just suggest the above when anyone has problems with it.)

rick

>
> Daniel Feenberg
> National Bureau of Economic Research


More information about the freebsd-questions mailing list