TCP stack lock contention with short-lived connections
Julien Charbon
jcharbon at verisign.com
Thu Nov 7 14:10:51 UTC 2013
Hi list,
On Mon, 04 Nov 2013 22:21:04 +0100, Julien Charbon <jcharbon at verisign.com>
wrote:
> just a follow-up of vBSDCon discussions about FreeBSD TCP performances
> with short-lived connections. In summary: <snip>
>
> I have put technical and how-to-repeat details in below PR:
>
> kern/183659: TCP stack lock contention with short-lived connections
> http://www.freebsd.org/cgi/query-pr.cgi?pr=183659
>
> We are currently working on this performance improvement effort; it
> will impact only the TCP locking strategy not the TCP stack logic
> itself. We will share on freebsd-net the patches we made for reviewing
> and improvement propositions; anyway this change might also require
> enough eyeballs to avoid tricky race conditions introduction in TCP
> stack.
Just a follow-up: We are currently removing TCP INP_INFO lock from
places it is actually not required in order to mitigate the lock
contention. It seems to be a good first step in this effort: Small
changes, easy to review, low risk (and small gain... right).
Below a first patch that removes INP_INFO lock from tcp_usr_accept():
This changes simply follows the advice made in corresponding code
comment: "A better fix would prevent the socket from being placed in the
listen queue until all fields are fully initialized." For more technical
details, check the comment in related change below:
http://svnweb.freebsd.org/base?view=revision&revision=175612
With this patch applied we see no regressions and a performance
improvement of ~5% i.e with 9.2 vanilla kernel: 52k TCP Queries Per
Second, with 9.2 + joined patch: 55k TCP QPS. Not huge indeed but still
an improvement.
P.S.: Funny enough it seems that the same change has already been
proposed in the past:
http://lists.freebsd.org/pipermail/freebsd-net/2013-January/034261.html
--
Julien
From: Julien Charbon <jcharbon at verisign.com>
Subject: [PATCH] Add new socket in listen queue only when fully initialized
---
sys/netinet/tcp_syncache.c | 4 +++-
sys/netinet/tcp_usrreq.c | 9 ---------
2 files changed, 3 insertions(+), 10 deletions(-)
diff --git a/sys/netinet/tcp_syncache.c b/sys/netinet/tcp_syncache.c
index af1651a..eb73356 100644
--- a/sys/netinet/tcp_syncache.c
+++ b/sys/netinet/tcp_syncache.c
@@ -660,7 +660,7 @@ syncache_socket(struct syncache *sc, struct socket
*lso, struct mbuf *m)
* connection when the SYN arrived. If we can't create
* the connection, abort it.
*/
- so = sonewconn(lso, SS_ISCONNECTED);
+ so = sonewconn(lso, 0);
if (so == NULL) {
/*
* Drop the connection; we will either send a RST or
@@ -890,6 +890,8 @@ syncache_socket(struct syncache *sc, struct socket
*lso, struct mbuf *m)
INP_WUNLOCK(inp);
+ soisconnected(so);
+
TCPSTAT_INC(tcps_accepts);
return (so);
diff --git a/sys/netinet/tcp_usrreq.c b/sys/netinet/tcp_usrreq.c
index b83f34a..566cc34 100644
--- a/sys/netinet/tcp_usrreq.c
+++ b/sys/netinet/tcp_usrreq.c
@@ -609,13 +609,6 @@ out:
/*
* Accept a connection. Essentially all the work is done at higher
levels;
* just return the address of the peer, storing through addr.
- *
- * The rationale for acquiring the tcbinfo lock here is somewhat
complicated,
- * and is described in detail in the commit log entry for r175612.
Acquiring
- * it delays an accept(2) racing with sonewconn(), which inserts the
socket
- * before the inpcb address/port fields are initialized. A better fix
would
- * prevent the socket from being placed in the listen queue until all
fields
- * are fully initialized.
*/
static int
tcp_usr_accept(struct socket *so, struct sockaddr **nam)
@@ -632,7 +625,6 @@ tcp_usr_accept(struct socket *so, struct sockaddr
**nam)
inp = sotoinpcb(so);
KASSERT(inp != NULL, ("tcp_usr_accept: inp == NULL"));
- INP_INFO_RLOCK(&V_tcbinfo);
INP_WLOCK(inp);
if (inp->inp_flags & (INP_TIMEWAIT | INP_DROPPED)) {
error = ECONNABORTED;
@@ -652,7 +644,6 @@ tcp_usr_accept(struct socket *so, struct sockaddr
**nam)
out:
TCPDEBUG2(PRU_ACCEPT);
INP_WUNLOCK(inp);
- INP_INFO_RUNLOCK(&V_tcbinfo);
if (error == 0)
*nam = in_sockaddr(port, &addr);
return error;
More information about the freebsd-net
mailing list