AW: NFS Mount Hangs

Mon Apr 12 07:49:48 UTC 2021

I was trying to do some simple tests yesterday - but don't know if these are representative:

Using an old Debian 3.16.3 linux box as nfs client, and simulating the disconnect with an ipfw rule, while introducing some packet drops using dummynet (I really should be adding a simple markov-chain state machine for burst losses), to utilize some of the socket upcalls in the tcp_input code flow. But it got too late before I arrived at any relevant results...

Richard Scheffenegger
Consulting Solution Architect
NAS & Networking

NetApp
+43 1 3676 811 3157 Direct Phone
+43 664 8866 1857 Mobile Phone
Richard.Scheffenegger at netapp.com

https://ts.la/richard49892

-----Ursprüngliche Nachricht-----
Von: Rick Macklem <rmacklem at uoguelph.ca> 
Gesendet: Montag, 12. April 2021 00:50
An: Scheffenegger, Richard <Richard.Scheffenegger at netapp.com>; tuexen at freebsd.org
Cc: Youssef GHORBAL <youssef.ghorbal at pasteur.fr>; freebsd-net at freebsd.org
Betreff: Re: NFS Mount Hangs

NetApp Security WARNING: This is an external email. Do not click links or open attachments unless you recognize the sender and know the content is safe.

I should be able to test D69290 in about a week.
Note that I will not be able to tell if it fixes otis@'s hung Linux client problem.

rick

________________________________________
From: Scheffenegger, Richard <Richard.Scheffenegger at netapp.com>
Sent: Sunday, April 11, 2021 12:54 PM
To: tuexen at freebsd.org; Rick Macklem
Cc: Youssef GHORBAL; freebsd-net at freebsd.org
Subject: Re: NFS Mount Hangs

CAUTION: This email originated from outside of the University of Guelph. Do not click links or open attachments unless you recognize the sender and know the content is safe. If in doubt, forward suspicious emails to IThelp at uoguelph.ca

>From what i understand rick stating around the socket state changing before the upcall, i can only speculate that the rst fight is for the new sessios the client tries with the same 5tuple, while server side the old original session persists, as the nfs server never closes /shutdown the session .

But a debug logged version of the socket upcall used by the nfs server should reveal any differences in socket state at the time of upcall.

I would very much like to know if d29690 addresses that problem (if it was due to releasing the lock before the upcall), or if that still shows differences between prior to my central upcall change, post that change and with d29690 ...

________________________________
Von: tuexen at freebsd.org <tuexen at freebsd.org>
Gesendet: Sunday, April 11, 2021 2:30:09 PM
An: Rick Macklem <rmacklem at uoguelph.ca>
Cc: Scheffenegger, Richard <Richard.Scheffenegger at netapp.com>; Youssef GHORBAL <youssef.ghorbal at pasteur.fr>; freebsd-net at freebsd.org <freebsd-net at freebsd.org>
Betreff: Re: NFS Mount Hangs

NetApp Security WARNING: This is an external email. Do not click links or open attachments unless you recognize the sender and know the content is safe.

> On 10. Apr 2021, at 23:59, Rick Macklem <rmacklem at uoguelph.ca> wrote:
>
> tuexen at freebsd.org wrote:
>> Rick wrote:
> [stuff snipped]
>>>> With r367492 you don't get the upcall with the same error state? Or you don't get an error on a write() call, when there should be one?
>> If Send-Q is 0 when the network is partitioned, after healing, the 
>> krpc sees no activity on the socket (until it acquires/processes an RPC it will not do a sosend()).
>> Without the 6minute timeout, the RST battle goes on "forever" (I've 
>> never actually waited more than 30minutes, which is close enough to "forever" for me).
>> --> With the 6minute timeout, the "battle" stops after 6minutes, when 
>> --> the timeout
>>     causes a soshutdown(..SHUT_WR) on the socket.
>>     (Since the soshutdown() patch is not yet in "main". I got comments, but no "reviewed"
>>      on it, the 6minute timer won't help if enabled in main. The soclose() won't happen
>>      for TCP connections with the back channel enabled, such as Linux 
>> 4.1/4.2 ones.) I'm confused. So you are saying that if the Send-Q is 
>> empty when you partition the network, and the peer starts to send 
>> SYNs after the healing, FreeBSD responds with a challenge ACK which 
>> triggers the sending of a RST by Linux. This RST is ignored multiple times.
>> Is that true? Even with my patch for the the bug I introduced?
> Yes and yes.
> Go take another look at linuxtofreenfs.pcap ("fetch 
> https://people.freebsd.org/~rmacklem/linuxtofreenfs.pcap" if you don't  
> already have it.) Look at packet #1949->2069. I use wireshark, but 
> you'll have your favourite.
> You'll see the "RST battle" that ends after 6minutes at packet#2069. 
> If there is no 6minute timeout enabled in the server side krpc, then 
> the battle just continues (I once let it run for about 30minutes 
> before giving up). The 6minute timeout is not currently enabled in 
> main, etc.
Hmm. I don't understand why r367492 can impact the processing of the RST, which basically destroys the TCP connection.

Richard: Can you explain that?

Best regards
Michael
>
>> What version of the kernel are you using?
> "main" dated Dec. 23, 2020 + your bugfix + assorted NFS patches that 
> are not relevant + 2 small krpc related patches.
> --> The two small krpc related patches enable the 6minute timeout and
>       add a soshutdown(..SHUT_WR) call when the 6minute timeout is
>       triggered. These have no effect until the 6minutes is up and, without
>       them the "RTS battle" goes on forever.
>
> Add to the above a revert of r367492 and the RST battle goes away and 
> things behave as expected. The recovery happens quickly after the 
> network is unpartitioned, with either 0 or 1 RSTs.
>
> rick
> ps: Once the irrelevant NFS patches make it into "main", I will upgrade to
>     main bits-de-jur for testing.
>
> Best regards
> Michael
>>
>> If Send-Q is non-empty when the network is partitioned, the battle will not happen.
>>
>>>
>>> My understanding is that he needs this error indication when calling shutdown().
>> There are several ways the krpc notices that a TCP connection is no longer functional.
>> - An error return like EPIPE from either sosend() or soreceive().
>> - A return of 0 from soreceive() with no data (normal EOF from other end).
>> - A 6minute timeout on the server end, when no activity has occurred 
>> on the connection. This timer is currently disabled for NFSv4.1/4.2 
>> mounts in "main", but I enabled it for this testing, to stop the "RST battle goes on forever"
>> during testing. I am thinking of enabling it on "main", but this 
>> crude bandaid shouldn't be thought of as a "fix for the RST battle".
>>
>>>>
>>>> From what you describe, this is on writes, isn't it? (I'm asking, at the original problem that was fixed with r367492, occurs in the read path (draining of ths so_rcv buffer in the upcall right away, which subsequently influences the ACK sent by the stack).
>>>>
>>>> I only added the so_snd buffer after some discussion, if the WAKESOR shouldn't have a symmetric equivalent on WAKESOW....
>>>>
>>>> Thus a partial backout (leaving the WAKESOR part inside, but reverting the WAKESOW part) would still fix my initial problem about erraneous DSACKs (which can also lead to extremely poor performance with Linux clients), but possible address this issue...
>>>>
>>>> Can you perhaps take MAIN and apply https://reviews.freebsd.org/D29690 for the revert only on the so_snd upcall?
>> Since the krpc only uses receive upcalls, I don't see how reverting 
>> the send side would have any effect?
>>
>>> Since the release of 13.0 is almost done, can we try to fix the issue instead of reverting the commit?
>> I think it has already shipped broken.
>> I don't know if an errata is possible, or if it will be broken until 13.1.
>>
>> --> I am much more concerned with the otis@ stuck client problem than 
>> --> this RST battle that only
>>      occurs after a network partitioning, especially if it is 13.0 specific.
>>      I did this testing to try to reproduce Jason's stuck client (with connection in CLOSE_WAIT)
>>      problem, which I failed to reproduce.
>>
>> rick
>>
>> Rs: agree, a good understanding where the interaction btwn stack, 
>> socket and in kernel tcp user breaks is needed;
>>
>>>
>>> If this doesn't help, some major surgery will be necessary to prevent NFS sessions with SACK enabled, to transmit DSACKs...
>>
>> My understanding is that the problem is related to getting a local 
>> error indication after receiving a RST segment too late or not at all.
>>
>> Rs: but the move of the upcall should not materially change that; i don't have a pc here to see if any upcall actually happens on rst...
>>
>> Best regards
>> Michael
>>>
>>>
>>>> I know from a printf that this happened, but whether it caused the RST battle to not happen, I don't know.
>>>>
>>>> I can put r367492 back in and do more testing if you'd like, but I think it probably needs to be reverted?
>>>
>>> Please, I don't quite understand why the exact timing of the upcall would be that critical here...
>>>
>>> A comparison of the soxxx calls and errors between the "good" and the "bad" would be perfect. I don't know if this is easy to do though, as these calls appear to be scattered all around the RPC / NFS source paths.
>>>
>>>> This does not explain the original hung Linux client problem, but does shed light on the RST war I could create by doing a network partitioning.
>>>>
>>>> rick
>>>
>>> _______________________________________________
>>> freebsd-net at freebsd.org mailing list 
>>> https://lists.freebsd.org/mailman/listinfo/freebsd-net
>>> To unsubscribe, send any mail to "freebsd-net-unsubscribe at freebsd.org"
>>
>> _______________________________________________
>> freebsd-net at freebsd.org mailing list
>> https://lists.freebsd.org/mailman/listinfo/freebsd-net
>> To unsubscribe, send any mail to "freebsd-net-unsubscribe at freebsd.org"
>
> _______________________________________________
> freebsd-net at freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscribe at freebsd.org"