[PATCH] Add a new TCP_IGNOREIDLE socket option
Alfred Perlstein
bright at mu.org
Tue Jan 22 20:35:44 UTC 2013
On 1/22/13 12:11 PM, John Baldwin wrote:
> As I mentioned in an earlier thread, I recently had to debug an issue we were
> seeing across a link with a high bandwidth-delay product (both high bandwidth
> and high RTT). Our specific use case was to use a TCP connection to reliably
> forward a latency-sensitive datagram stream across a WAN connection. We would
> often see spikes in the latency of individual datagrams. I eventually tracked
> this down to the connection entering slow start when it would transmit data
> after being idle. The data stream was quite bursty and would often attempt to
> transmit a burst of data after being idle for far longer than a retransmit
> timeout.
>
> In 7.x we had worked around this in the past by disabling RFC 3390 and jacking
> the slow start window size up via a sysctl. On 8.x this no longer worked.
> The solution I came up with was to add a new socket option to disable idle
> handling completely. That is, when an idle connection restarts with this new
> option enabled, it keeps its current congestion window and doesn't enter slow
> start.
>
> There are only a few cases where such an option is useful, but if anyone else
> thinks this might be useful I'd be happy to add the option to FreeBSD.
This looks good, but it almost sounds like a bug for TCP to be doing
this anyhow.
Why would one want this behavior?
Wouldn't it make sense to keep the window large until there was a
problem rather than unconditionally chop it down? I almost think TCP is
afraid that you might wind up swapping out a 10gig interface for a
modem? I'm just not getting it. (probably simple oversight on my part).
What do you think about also making this a sysctl for global on/off by
default?
-Alfred
>
> Index: share/man/man4/tcp.4
> ===================================================================
> --- share/man/man4/tcp.4 (revision 245742)
> +++ share/man/man4/tcp.4 (working copy)
> @@ -205,6 +205,18 @@
> in the
> .Sx MIB Variables
> section further down.
> +.It Dv TCP_IGNOREIDLE
> +If a TCP connection is idle for more than one retransmit timeout,
> +it enters slow start when new data is available to transmit.
> +This avoids flooding the network with a full window of traffic at line rate.
> +It also allows the connection to adjust to changes to network conditions
> +that occurred while the connection was idle. A connection that sends
> +bursts of data separated by large idle periods can be permamently stuck in
> +slow start as a result.
> +The boolean option
> +.Dv TCP_IGNOREIDLE
> +disables the idle connection handling allowing connections to maintain the
> +existing congestion window when restarting after an idle period.
> .It Dv TCP_NODELAY
> Under most circumstances,
> .Tn TCP
> Index: sys/netinet/tcp_var.h
> ===================================================================
> --- sys/netinet/tcp_var.h (revision 245742)
> +++ sys/netinet/tcp_var.h (working copy)
> @@ -230,6 +230,7 @@
> #define TF_NEEDFIN 0x000800 /* send FIN (implicit state) */
> #define TF_NOPUSH 0x001000 /* don't push */
> #define TF_PREVVALID 0x002000 /* saved values for bad rxmit valid */
> +#define TF_IGNOREIDLE 0x004000 /* connection is never idle */
> #define TF_MORETOCOME 0x010000 /* More data to be appended to sock */
> #define TF_LQ_OVERFLOW 0x020000 /* listen queue overflow */
> #define TF_LASTIDLE 0x040000 /* connection was previously idle */
> Index: sys/netinet/tcp_output.c
> ===================================================================
> --- sys/netinet/tcp_output.c (revision 245742)
> +++ sys/netinet/tcp_output.c (working copy)
> @@ -206,7 +206,8 @@
> * to send, then transmit; otherwise, investigate further.
> */
> idle = (tp->t_flags & TF_LASTIDLE) || (tp->snd_max == tp->snd_una);
> - if (idle && ticks - tp->t_rcvtime >= tp->t_rxtcur)
> + if (!(tp->t_flags & TF_IGNOREIDLE) &&
> + idle && ticks - tp->t_rcvtime >= tp->t_rxtcur)
> cc_after_idle(tp);
> tp->t_flags &= ~TF_LASTIDLE;
> if (idle) {
> Index: sys/netinet/tcp.h
> ===================================================================
> --- sys/netinet/tcp.h (revision 245823)
> +++ sys/netinet/tcp.h (working copy)
> @@ -156,6 +156,7 @@
> #define TCP_NODELAY 1 /* don't delay send to coalesce packets */
> #if __BSD_VISIBLE
> #define TCP_MAXSEG 2 /* set maximum segment size */
> +#define TCP_IGNOREIDLE 3 /* disable idle connection handling */
> #define TCP_NOPUSH 4 /* don't push last block of write */
> #define TCP_NOOPT 8 /* don't use TCP options */
> #define TCP_MD5SIG 16 /* use MD5 digests (RFC2385) */
> Index: sys/netinet/tcp_usrreq.c
> ===================================================================
> --- sys/netinet/tcp_usrreq.c (revision 245742)
> +++ sys/netinet/tcp_usrreq.c (working copy)
> @@ -1354,6 +1354,7 @@
>
> case TCP_NODELAY:
> case TCP_NOOPT:
> + case TCP_IGNOREIDLE:
> INP_WUNLOCK(inp);
> error = sooptcopyin(sopt, &optval, sizeof optval,
> sizeof optval);
> @@ -1368,6 +1369,9 @@
> case TCP_NOOPT:
> opt = TF_NOOPT;
> break;
> + case TCP_IGNOREIDLE:
> + opt = TF_IGNOREIDLE;
> + break;
> default:
> opt = 0; /* dead code to fool gcc */
> break;
> @@ -1578,6 +1582,11 @@
> INP_WUNLOCK(inp);
> error = sooptcopyout(sopt, buf, TCP_CA_NAME_MAX);
> break;
> + case TCP_IGNOREIDLE:
> + optval = tp->t_flags & TF_IGNOREIDLE;
> + INP_WUNLOCK(inp);
> + error = sooptcopyout(sopt, &optval, sizeof optval);
> + break;
> default:
> INP_WUNLOCK(inp);
> error = ENOPROTOOPT;
>
More information about the freebsd-net
mailing list