FreeBSD Window updates
Andre Oppermann
andre at freebsd.org
Sun Nov 30 15:18:16 PST 2008
Andre Oppermann wrote:
> David Malone wrote:
>> I've got an example extract tcpdump of this at the end of the mail
>> - here 6 ACKs are sent, 5 of which are pure window updates and
>> several are 2us apart!
>>
>> I think the easy option is to delete the code that generates explicit
>> window updates if the window moves by 2*MSS. We then should be doing
>> something similar to Linux. The other easy alternative would be to
>> add a sysclt that lets us generate an window update every N*MSS and
>> by default set it to something big, like 10 or 100. That should
>> effectively eliminate the updates during bulk data transfer, but
>> may still generate some window updates after a loss.
>
> The main problem of the pure window update test in tcp_output() is
> its complete ignorance of delayed ACKs. Second is the strict 4.4BSD
> adherence to sending an update for every window increase of >= 2*MSS.
> The third issue of sending a slew of window updates after having
> received a FIN (telling us the other end won't ever send more data)
> I have already fixed some moons ago.
>
> In my new-tcp work I've come across the window update logic some time
> ago and backchecked with relevant RFCs and other implementations.
> Attached is a compiling but otherwise untested backport of the new logic.
Slightly improved version attached.
--
Andre
-------------- next part --------------
Index: tcp_output.c
===================================================================
RCS file: /home/ncvs/src/sys/netinet/tcp_output.c,v
retrieving revision 1.158
diff -u -p -r1.158 tcp_output.c
--- tcp_output.c 27 Nov 2008 13:19:42 -0000 1.158
+++ tcp_output.c 30 Nov 2008 23:16:30 -0000
@@ -539,29 +539,56 @@ after_sack_rexmit:
}
/*
- * Compare available window to amount of window
- * known to peer (as advertised window less
- * next expected input). If the difference is at least two
- * max size segments, or at least 50% of the maximum possible
- * window, then want to send a window update to peer.
+ * Compare available window to amount of window known to peer
+ * (as advertised window less next expected input) and decide
+ * if we have to send a pure window update segment.
+ *
+ * When a delayed ACK is scheduled, do nothing. It will update
+ * the window anyway in a few milliseconds.
+ *
+ * If the receive socket buffer has less than 1/4 of space
+ * available and if the difference is at least two max size
+ * segments, send an immediate window update to peer.
+ *
+ * Otherwise if the difference is 1/8 (or more) of the receive
+ * socket buffer, or at least 1/2 of the maximum possible window,
+ * then we send a window update too.
+ *
* Skip this if the connection is in T/TCP half-open state.
* Don't send pure window updates when the peer has closed
* the connection and won't ever send more data.
+ *
+ * See RFC793, Section 3.7, page 43, Window Management Suggestions
+ * See RFC1122: Section 4.2.3.3, When to Send a Window Update
+ *
+ * Note: We are less aggressive with sending window update than
+ * recommended in RFC1122. This is fine with todays large socket
+ * buffers and will not stall the peer. In addition we piggy back
+ * window update on regular ACKs and sends.
*/
- if (recwin > 0 && !(tp->t_flags & TF_NEEDSYN) &&
- !TCPS_HAVERCVDFIN(tp->t_state)) {
+ if (recwin > 0 && !(tp->t_flags & TF_DELACK) &&
+ !(tp->t_flags & TF_NEEDSYN) && !TCPS_HAVERCVDFIN(tp->t_state)) {
/*
* "adv" is the amount we can increase the window,
* taking into account that we are limited by
* TCP_MAXWIN << tp->rcv_scale.
+ *
+ * NB: adv must be equal or larger than the smallest
+ * unscaled window increment.
*/
long adv = min(recwin, (long)TCP_MAXWIN << tp->rcv_scale) -
(tp->rcv_adv - tp->rcv_nxt);
- if (adv >= (long) (2 * tp->t_maxseg))
- goto send;
- if (2 * adv >= (long) so->so_rcv.sb_hiwat)
- goto send;
+ if (adv >= (long)0x1 << tp->rcv_scale) {
+ if (recwin <= (long)(so->so_rcv.sb_hiwat / 4) &&
+ adv >= (long)(2 * tp->t_maxseg))
+ goto send;
+ if (adv >= (long)(so->so_rcv.sb_hiwat / 8) &&
+ adv >= (long)tp->t_maxseg)
+ goto send;
+ if (2 * adv >= (long)so->so_rcv.sb_hiwat)
+ goto send;
+ }
}
/*
More information about the freebsd-net
mailing list