[Bug 260011] Unresponsive NFS mount on AWS EFS
- In reply to: bugzilla-noreply_a_freebsd.org: "[Bug 260011] Unresponsive NFS mount on AWS EFS"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Fri, 27 May 2022 00:04:23 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=260011 Rick Macklem <rmacklem@FreeBSD.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Assignee|bugs@FreeBSD.org |rmacklem@FreeBSD.org --- Comment #17 from Rick Macklem <rmacklem@FreeBSD.org> --- Created attachment 234241 --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=234241&action=edit handle bogus slot# replies for the Sequence op cpercival@ emailed with some diagnostics (that I did not realize were not in 13.0) which indicates that the Amazon EFS server is pretty badly broken. It sometimes (I don't know how frequently) returns the wrong slotid for a session. (It is required by the RFC to be the same as the request.) Once this happens, there is no way to know which slot# the server actually used. This patch (which is rather large and, unfortunately, will not apply to 13.0, but should apply to stable/13 and 13.1, I think?) marks both of the slots (the one in the request and the one in the reply) bad, so they will no longer be used. When all slots get marked "bad", it does a DestroySession operation, which should make subsequent uses of the session fail with NFSERR_BADSESSION. An NFSERR_BADSESSION reply should, in turn, start a recovery cycle which should create a new session that can be used. This patch has been tested against a hacked FreeBSD nfsd that replies with a bogus slot# once every 100 RPCs and seems to work ok. I have no idea if the Amazon EFS server will behave the same way, but I am hoping cpercival@ will be able to test it. I believe this serious bug in the Amazon EFS server would explain your hangs. -- You are receiving this mail because: You are the assignee for the bug.