[Bug 261291] ESX NFS4.1 client hangs, server never responds to EXCHANGE_ID/CREATE_SESSION

From: <bugzilla-noreply_at_freebsd.org>
Date: Tue, 18 Jan 2022 15:26:49 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=261291

--- Comment #4 from Rick Macklem <rmacklem@FreeBSD.org> ---
Hmm. Took a look and it looks like a variant of the TCP bug.
You'll notice that the last NFS reply for the NFSv4 connection
(port #805 on the client) shows up at packet #653.
After that, all there is from the FreeBSD end are ACks, as you
said.

The NFSv3 RPCs near the end of the trace are done on other TCP
connections (port#800 and #804) and they work.

Can you roll back to before rscheff@'s fix and then revert
r367492? (See PR#256280.) In other words, get that code back
to its pre-r367492 state.
--> The pre-r367492 code has worked ok, literally for decades.
We believe that rscheff@'s fix is ok because otis@ did not observe
a hang during two weeks of testing, but that doesn't guarantee it
fixed the problem.

Other possibilities are that the nfsd threads are getting hung
trying to do some RPC around packet #653, but I would have expected
that to result in all nfsd threads hung eventually (and the server
obviously is not in that state, since RPCs on other connections
are still working).

If it happens again, do these commands on the FreeBSD server:
# ps axHl  <-- to look for "hung" nfsd threads
# netstat -a <-- to look at the TCP connection for the broken client.
  (If it is in ESTABLISHED state with a non-0 Recv-Q, the rscheff@
   patch has not fixed the problem.)

-- 
You are receiving this mail because:
You are the assignee for the bug.