NFS delegations don't expire after unmounting client
Alan Somers
asomers at freebsd.org
Thu Feb 11 21:32:15 UTC 2021
On Thu, Feb 11, 2021 at 2:07 PM Rick Macklem <rmacklem at uoguelph.ca> wrote:
> Alan Somers wrote:
> >I have several Linux 5.9.15 clients mounting NFS 4.1 served from a FreeBSD
> >12.2-RELEASE server. Today, most of those clients' mounts hung, and their
> >dmesg displayed "nfs: server XXX not responding, still trying". But one
> >client kept running fine. nfsdumpstate on the server showed that that
> >client, and that one only, had 2 delegations. It also had 1 OpenOwner, 1
> >Open, and the CB flags set. It was the only client that had CB set. On
> >the theory that its delegation callbacks weren't working, I tried
> >unmounting all of its NFS shares. That worked, but to my surprise
> >nfsdumpstate showed no change! I could see that the lease time recorded
> in
> >/var/run/nfs-stablerestart was 120s, and I must've waited about 30m in all
> >before disabling delegations, unmounting everything, and returning to NFS
> >v3. So my questions are, what can cause a delegation to linger around
> long
> >after it should've expired, and what else can I do to debug this problem
> if
> >it recurs?
> The FreeBSD NFSv4 server implements "courtesy locks" (my idea, but someone
> else coined the term for it), where a lock is not thrown away until both
> the
> lease has expired and a conflicting lock request is received from another
> client.
> --> In this case, that would be an Open of the file from another client.
> The idea is to avoid loss of lock state when there is a networking
> partitioning
> that exceeds the lease duration.
>
Ahh, so maybe the stale delegation was a red herring! That would make
sense. Especially because the client with the stale delegation was
mounting a different share than at least one of the hung clients.
>
> When a client dismounts, it should tell the server it is done with the
> open/lock
> state by doing a DestroyClientID operation.
> (SetClientID/SetClientIDConfirm for 4.0)
> --> If the Linux client did this, then it sounds like something is broken
> in the server,
> but my hunch is that the Linux client did not do this.
> If you can capture packets during a dismount, you should be able to look
> at them in wireshark and see if the DestroyClientID happened.
>
> There is also the nfsrevoke command, which is supposed to be able to
> get rid of client lock state, but I'll admit I haven't tested it in like a
> decade;-)
>
Well, it looks like it works. When I tried it, the delegation disappeared
from nfsdumpstate's output. That did not resolve the hang, however. So
the delegation was probably red herring then.
I guess I'll have to roll up my sleeves and start tcpdumping then. Sigh.
Thanks for the tips.
-Alan
More information about the freebsd-fs
mailing list