[PATCH] Add a new TCP_IGNOREIDLE socket option

Thu Jan 24 19:52:36 UTC 2013

On Thursday, January 24, 2013 3:03:31 am Andre Oppermann wrote:
> On 24.01.2013 03:31, Sepherosa Ziehau wrote:
> > On Thu, Jan 24, 2013 at 12:15 AM, John Baldwin <jhb at freebsd.org> wrote:
> >> On Wednesday, January 23, 2013 1:33:27 am Sepherosa Ziehau wrote:
> >>> On Wed, Jan 23, 2013 at 4:11 AM, John Baldwin <jhb at freebsd.org> wrote:
> >>>> As I mentioned in an earlier thread, I recently had to debug an issue we were
> >>>> seeing across a link with a high bandwidth-delay product (both high bandwidth
> >>>> and high RTT).  Our specific use case was to use a TCP connection to reliably
> >>>> forward a latency-sensitive datagram stream across a WAN connection.  We would
> >>>> often see spikes in the latency of individual datagrams.  I eventually tracked
> >>>> this down to the connection entering slow start when it would transmit data
> >>>> after being idle.  The data stream was quite bursty and would often attempt to
> >>>> transmit a burst of data after being idle for far longer than a retransmit
> >>>> timeout.
> >>>>
> >>>> In 7.x we had worked around this in the past by disabling RFC 3390 and jacking
> >>>> the slow start window size up via a sysctl.  On 8.x this no longer worked.
> >>>> The solution I came up with was to add a new socket option to disable idle
> >>>> handling completely.  That is, when an idle connection restarts with this new
> >>>> option enabled, it keeps its current congestion window and doesn't enter slow
> >>>> start.
> >>>>
> >>>> There are only a few cases where such an option is useful, but if anyone else
> >>>> thinks this might be useful I'd be happy to add the option to FreeBSD.
> >>>
> >>> I think what you need is the RFC2861, however, you probably should
> >>> ignore the "application-limited period" part of RFC2861.
> >>
> >> Hummm.  It appears btw, that Linux uses RFC 2861, but has a global knob to
> >> disable it due to applictions having problems.  When it is disabled,
> >> it doesn't decay the congestion window at all during idle handling.  That is,
> >> it appears to act the same as if TCP_IGNOREIDLE were enabled.
> >>
> >>  From http://www.kernel.org/doc/man-pages/online/pages/man7/tcp.7.html:
> >>
> >>         tcp_slow_start_after_idle (Boolean; default: enabled; since Linux 2.6.18)
> >>                If enabled, provide RFC 2861 behavior and time out the congestion
> >>                window after an idle period.  An idle period is defined as the current
> >>                RTO (retransmission timeout).  If disabled, the congestion window will
> >>                not be timed out after an idle period.
> >>
> >> Also, in this thread on tcp-m it appears no one on that list realizes that
> >> there are any implementations which follow the "SHOULD" in RFC 2581 for idle
> >> handling (which is what we do currently):
> >
> > Nah, I don't think the idle detection in FreeBSD follows the
> > RFC2581/RFC5681 4.1 (the paragraph before the "SHOULD").  IMHO, that's
> > probably why the author in the following email requestioned about the
> > implementation of "SHOULD" in RFC2581/RFC5681.
> >
> >>
> >> http://www.ietf.org/mail-archive/web/tcpm/current/msg02864.html
> >>
> >> So if we were to implement RFC 2861, the new socket option would be equivalent
> >> to setting Linux's 'tcp_slow_start_after_idle' to false, but on a per-socket
> >> basis rather than globally.
> >
> > Agree, per-socket option could be useful than global sysctls under
> > certain situation.  However, in addition to the per-socket option,
> > could global sysctl nodes to disable idle_restart/idle_cwv help too?
> 
> No.  This is far too dangerous once it makes it into some tuning guide.
> The threat of congestion breakdown is real.  The Internet, or any packet
> network, can only survive in the long term if almost all follow the rules
> and self-constrain to remain fair to the others.  What would happen if
> nobody would respect the traffic lights anymore?

The problem with this argument is Linux has already had this as a tunable
option for years and the Internet hasn't melted as a result.

> Besides that bursting into unknown network conditions is very likely to
> result in burst losses as well.  TCP isn't good at recovering from it.
> In the end you most likely come out ahead if you decay the restartCWND.
> 
> We have two cases primarily: a) long distance, medium to high RTT, and
> wildly varying bandwidth (a.k.a. the Internet); b) short distance, low
> RTT and mostly plenty of bandwidth (a.k.a. Datacenter).  The former
> absolutely definately requires a decayed restartCWND.  The latter less
> so but even there bursting at 10Gig TSO assisted wirespeed isn't going
> to end too happy more often than not.

You forgot my case: c) dedicated long distance links with high bandwidth.

> Since this seems to be a burning issue I'll come up with a patch in the
> next days to add a decaying restartCWND that'll be fair and allow a very
> quick ramp up if no loss occurs.

I think this could be useful.  OTOH, I still think the TCP_IGNOREIDLE option
is useful both with and without a decaying restartCWND?

-- 
John Baldwin