git: 053a98849734 - main - tcp: don't ever return ECONNRESET on close(2)

From: Gleb Smirnoff <glebius_at_FreeBSD.org>
Date: Mon, 23 Dec 2024 18:36:26 UTC
The branch main has been updated by glebius:

URL: https://cgit.FreeBSD.org/src/commit/?id=053a988497342a6fd0a717cc097d09c23f83e103

commit 053a988497342a6fd0a717cc097d09c23f83e103
Author:     Gleb Smirnoff <glebius@FreeBSD.org>
AuthorDate: 2024-12-23 18:35:49 +0000
Commit:     Gleb Smirnoff <glebius@FreeBSD.org>
CommitDate: 2024-12-23 18:35:49 +0000

    tcp: don't ever return ECONNRESET on close(2)
    
    The SUS doesn't mention this error code as a possible one [1]. The FreeBSD
    manual page specifies a possible ECONNRESET for close(2):
    
    [ECONNRESET]    The underlying object was a stream socket that was
                    shut down by the peer before all pending data was
                    delivered.
    
    In the past it had been EINVAL (see 21367f630d72), and this EINVAL was
    added as a safety measure in 623dce13c64ef.  After conversion to
    ECONNRESET it had been documented in the manual page in 78e3a7fdd51e6, but
    I bet wasn't ever tested to actually be ever returned, cause the
    tcp-testsuite[2] didn't exist back then.  So documentation is incorrect
    since 2006, if my bet wins.  Anyway, in the modern FreeBSD the condition
    described above doesn't end up with ECONNRESET error code from close(2).
    The error condition is reported via SO_ERROR socket option, though.  This
    can be checked using the tcp-testsuite, temporarily disabling the
    getsockopt(SO_ERROR) lines using sed command [3].  Most of these
    getsockopt(2)s are followed by '+0.00 close(3) = 0', which will confirm
    that close(2) doesn't return ECONNRESET even on a socket that has the
    error stored, neither it is returned in the case described in the manual
    page.  The latter case is covered by multiple tests residing in tcp-
    testsuite/state-event-engine/rcv-rst-*.
    
    However, the deleted block of code could be entered in a race condition
    between close(2) and processing of incoming packet, when connection had
    already been half-closed with shutdown(SHUT_WR) and sits in TCPS_LAST_ACK.
    This was reported in the bug 146845.  With the block deleted, we will
    continue into tcp_disconnect() which has proper handling of INP_DROPPED.
    
    The race explanation follows.  The connection is in TCPS_LAST_ACK.  The
    network input thread acquires the tcpcb lock first, sets INP_DROPPED,
    acquires the socket lock in soisdisconnected() and clears SS_ISCONNECTED.
    Meanwhile, the syscall thread goes through sodisconnect() which checks for
    SS_ISCONNECTED locklessly(!).  The check passes and the thread blocks on
    the tcpcb lock in tcp_usr_disconnect().  Once input thread releases the
    lock, the syscall thread observes INP_DROPPED and returns ECONNRESET.
    
    - Thread 1: tcp_do_segment()->tcp_close()->in_pcbdrop(),soisdisconnected()
    - Thread 2: sys_close()...->soclose()->sodisconnect()->tcp_usr_disconnect()
    
    Note that the lockless operation in sodisconnect() isn't correct, but
    enforcing the socket lock there will not fix the problem.
    
    [1] https://pubs.opengroup.org/onlinepubs/9799919799/
    [2] https://github.com/freebsd-net/tcp-testsuite
    [3] sed -i "" -Ee '/\+0\.00 getsockopt\(3, SOL_SOCKET, SO_ERROR, \[ECONNRESET\]/d' $(grep -lr ECONNRESET tcp-testsuite)
    
    PR:                     146845
    Reviewed by:            tuexen, rrs, imp
    Differential Revision:  https://reviews.freebsd.org/D48148
---
 lib/libsys/close.2       | 5 +----
 sys/netinet/tcp_usrreq.c | 5 -----
 2 files changed, 1 insertion(+), 9 deletions(-)

diff --git a/lib/libsys/close.2 b/lib/libsys/close.2
index 83f8604a4a22..91a7a902d70d 100644
--- a/lib/libsys/close.2
+++ b/lib/libsys/close.2
@@ -25,7 +25,7 @@
 .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
 .\" SUCH DAMAGE.
 .\"
-.Dd December 1, 2017
+.Dd December 18, 2024
 .Dt CLOSE 2
 .Os
 .Sh NAME
@@ -111,9 +111,6 @@ is not an active descriptor.
 An interrupt was received.
 .It Bq Er ENOSPC
 The underlying object did not fit, cached data was lost.
-.It Bq Er ECONNRESET
-The underlying object was a stream socket that was shut down by the peer
-before all pending data was delivered.
 .El
 .Pp
 In case of any error except
diff --git a/sys/netinet/tcp_usrreq.c b/sys/netinet/tcp_usrreq.c
index 35578b348c9f..acc3e2ea2942 100644
--- a/sys/netinet/tcp_usrreq.c
+++ b/sys/netinet/tcp_usrreq.c
@@ -690,11 +690,6 @@ tcp_usr_disconnect(struct socket *so)
 	inp = sotoinpcb(so);
 	KASSERT(inp != NULL, ("tcp_usr_disconnect: inp == NULL"));
 	INP_WLOCK(inp);
-	if (inp->inp_flags & INP_DROPPED) {
-		INP_WUNLOCK(inp);
-		NET_EPOCH_EXIT(et);
-		return (ECONNRESET);
-	}
 	tp = intotcpcb(inp);
 
 	if (tp->t_state == TCPS_TIME_WAIT)