net.inet.tcp.slowstart_flightsize in 8-STABLE

Maxim Dounin mdounin at mdounin.ru
Mon Sep 13 17:40:50 UTC 2010


Hello!

On Fri, Aug 06, 2010 at 10:56:40AM +0200, Andre Oppermann wrote:

> On 13.07.2010 16:01, Maxim Dounin wrote:
> >On Wed, May 12, 2010 at 04:47:02PM +0400, Igor Sysoev wrote:
> >
> >>It seems that net.inet.tcp.slowstart_flightsize does not work in 8-STABLE.
> >>For a long time I used slowstart_flightsize=2 on FreeBSD 4, 6, and 7 hosts.
> >>However, FreeBSD-8 always starts with the single packet.
> >>I saw this on different versions of 8-STABLE since 8 Oct 2009 till
> >>04 Apr 2010.
> >
> >Finally I had some time to look into it (sorry for long delay).
> >
> >1. Slow start isn't used on recent FreeBSD versions for initial snd_cwnd
> >calculations as long as you have rfc3390 support switched on (default since
> >Jan 06 23:29:46 2004, at least in 7.*).  It effectively sets initial
> >snd_cwnd to 3*MSS on common networks and shouldn't cause any problems.
> >Slowstart_flightsize only affects connection restarts.
> >
> >2. Due to bug in syncache code (patch below) all accepted connections has
> >their snd_cwnd reset to 1*MSS (since r171639, 7.0+ AFAIR).
> >
> >3. Support for rfc3465 introduced in r187289 uncovered (2) as
> >ACK to SYN/ACK no longer causes snd_cwnd increase by MSS (actually, this
> >increase shouldn't happen as it's explicitly forbidden by rfc 3390, but
> >it's another issue).  Snd_cwnd remains really small (1*MSS + 1) and this
> >causes really bad interaction with delayed acks on other side.
> >
> >As a workaround to delayed acks interaction problems you may disable
> >rfc3465 by setting net.inet.tcp.rfc3465 to 0.  Correct fix would be to apply
> >the patch below.
> >
> >To Andre Oppermann: could you please take a look at the patch and
> >commit it if found appropriate?
> 
> I've committed your fix with svn r210666. In a few days I will MFC it back
> to the stable branches.  Thanks for reporting the bug and a patch for it.

Andre, could you please take a look at one more patch as well?

Igor reported that it still sees 100ms delays with rfc3465 turned 
on, and it turns out to be similar issue (setting cwnd to 1*MSS) 
for hosts found in hostcache.

The problem with setting cwnd from hostcache was already reported 
here:

http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/92690
http://lists.freebsd.org/pipermail/freebsd-net/2007-July/014780.html

As using larger cwnd from hostcache may cause problems on 
congested links (see second thread) I changed code to only use 
cached cwnd as an upper bound for cwnd (instead of fixing current 
code).  This is also in-line with what we do on connection 
restarts.

We may later consider re-adding usage of larger cwnd from 
hostcache.  But I believe it should be done carefully and 
probably behind sysctl, off by default.

# HG changeset patch
# User Maxim Dounin <mdounin at mdounin.ru>
# Date 1284352618 -14400
# Node ID bbb9fea7978b26b95e96d463238a3acd8bfb5575
# Parent  6aec795c568cf6b9d2fabf8b8b9e25ad75b053d0
Use cwnd from hostcache only as upper bound.

Setting initial congestion window from hostcache wasn't working for
accepted connection since introduction due to tp->snd_wnd being 0.  As
a result it was instead limiting cwnd on such connections to 1*MSS.  With
net.inet.tcp.rfc3465 enabled this results bad interaction with delayed
acks and 100ms delays for hosts found in hostcache.

Additionally, it's considered unsafe to use initial congestion window
larger than thouse specified in RFC3390 as this may cause problems on
congested links.  RFC5681 says equation from RFC3390 MUST be used as
upper bound.

Links:

http://lists.freebsd.org/pipermail/freebsd-net/2007-July/014780.html
http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/92690

diff --git a/netinet/tcp_input.c b/netinet/tcp_input.c
--- a/netinet/tcp_input.c
+++ b/netinet/tcp_input.c
@@ -3332,29 +3332,13 @@ tcp_mss(struct tcpcb *tp, int offer)
 		tp->snd_bandwidth = metrics.rmx_bandwidth;
 
 	/*
-	 * Set the slow-start flight size depending on whether this
-	 * is a local network or not.
+	 * Set initial congestion window per RFC3390.  Alternatively, set
+	 * flight size depending on whether this is a local network or not.
 	 *
-	 * Extend this so we cache the cwnd too and retrieve it here.
-	 * Make cwnd even bigger than RFC3390 suggests but only if we
-	 * have previous experience with the remote host. Be careful
-	 * not make cwnd bigger than remote receive window or our own
-	 * send socket buffer. Maybe put some additional upper bound
-	 * on the retrieved cwnd. Should do incremental updates to
-	 * hostcache when cwnd collapses so next connection doesn't
-	 * overloads the path again.
-	 *
-	 * RFC3390 says only do this if SYN or SYN/ACK didn't got lost.
-	 * We currently check only in syncache_socket for that.
+	 * RFC3390 says we MUST limit initial window to one segment if SYN
+	 * or SYN/ACK is lost.  We currently check only in syncache_socket()
+	 * for that.
 	 */
-#define TCP_METRICS_CWND
-#ifdef TCP_METRICS_CWND
-	if (metrics.rmx_cwnd)
-		tp->snd_cwnd = max(mss,
-				min(metrics.rmx_cwnd / 2,
-				 min(tp->snd_wnd, so->so_snd.sb_hiwat)));
-	else
-#endif
 	if (V_tcp_do_rfc3390)
 		tp->snd_cwnd = min(4 * mss, max(2 * mss, 4380));
 #ifdef INET6
@@ -3367,6 +3351,10 @@ tcp_mss(struct tcpcb *tp, int offer)
 	else
 		tp->snd_cwnd = mss * V_ss_fltsz;
 
+	/* If we have cached cwnd - use it as an upper bound. */
+	if (metrics.rmx_cwnd)
+		tp->snd_cwnd = max(mss, min(tp->snd_cwnd, metrics.rmx_cwnd));
+
 	/* Check the interface for TSO capabilities. */
 	if (mtuflags & CSUM_TSO)
 		tp->t_flags |= TF_TSO;


Maxim Dounin


More information about the freebsd-net mailing list