[Bug 261291] ESX NFS4.1 client hangs, server never responds to EXCHANGE_ID/CREATE_SESSION

From: <bugzilla-noreply_at_freebsd.org>
Date: Tue, 08 Feb 2022 16:24:56 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=261291

--- Comment #12 from Alan Somers <asomers@FreeBSD.org> ---
I reproduced the issue again.  Deliberately this time, by downing one leg of
the LAGG during high traffic.  This time I tailored the packet capture more
narrowly, so it didn't drop any packets (the original pcap file contained
omissions just because tcpdump couldn't write to disk fast enough).  Crucially,
it shows that the client sent a DESTROY_SESSION rpc which didn't show up in the
original pcap file.  The sequence looks like this:

1) The client's (172.30.156.243) last regular NFS call is packet 84
2) After that are a ton of TCP segment reassemblies.  Probably related to the
lagg interruption
3) In packet 472, the client sends DESTROY_SESSION
4) The server (172.30.99.32) replies NFS4_OK in packet 474
5) The client sends EXCHANGE_ID in packet 475
6) The server responds with NFS4_OK and clientid 0xd9e0ee6135000000 in packet
477
7) The client sends CREATE_SESSION in packet 478 with clientid
0xd9e0ee6135000000
8) The server replies NFS4ERR_STALE_CLIENTID in packet 480
9) Go back to step 5 and loop

Could there be a problem in how we handle the DESTROY_SESSION rpc?  If you want
to look, I uploaded the new packet trace to my home directory on freefall,
named "slc-rb-nesx4-7-feb.create-session.pcap".

-- 
You are receiving this mail because:
You are the assignee for the bug.