Re: Cwnd grows slowly during slow-start due to LRO of the receiver side.
- In reply to: Hans Petter Selasky : "Re: Cwnd grows slowly during slow-start due to LRO of the receiver side."
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Wed, 03 May 2023 06:49:45 UTC
Hi Hans, Thanks for replying and suggestions. > Have you tested using FreeBSD main / 14 ? I tested 14.0-CURRENT built on 2023-04-27, it is indeed much improved. Now the TCP sender reaches 100Mbps in 4 seconds on a link with 100ms delay. % uname -a FreeBSD 14.0-CURRENT #0 main-n262599-60167184abd5: Thu Apr 27 08:09:50 UTC 2023 schen@freebsd14:~/recipes/tpc % bin/tcpperf -c 192.168.0.1 -t 6 Connected 192.168.0.100:59302 -> 192.168.0.1:2009, congestion control: cubic Time (s) Throughput Bitrate Cwnd Rwnd sndbuf ssthresh rtt/var 0.000s 0.00kB/s 0.00kbps 14.1Ki 63.6Ki 32.8Ki 1024Mi 97.8ms/2500 1.014s 776kB/s 6205kbps 166Ki 992Ki 313Ki 1024Mi 100.0ms/1875 2.021s 3643kB/s 29.1Mbps 495Ki 1491Ki 1017Ki 1024Mi 100.0ms/1875 3.029s 7544kB/s 60.3Mbps 932Ki 2096Ki 1817Ki 1024Mi 100.0ms/1875 4.036s 12.9MB/s 103Mbps 1729Ki 3064Ki 1817Ki 1024Mi 100.0ms/1875 5.046s 18.2MB/s 145Mbps 2606Ki 3056Ki 1817Ki 1024Mi 96.9ms/6875 6.090s 17.8MB/s 143Mbps 3074Ki 2974Ki 1817Ki 1024Mi 113.4ms/11250 Sender transferred 62.0MBytes in 6.090s, throughput: 10.2MBytes/s, 81.4Mbits/s Receiver transferred 62.0MBytes in 6.191s, throughput: 10.0MBytes/s, 80.1Mbits/s Cwnd increased much faster than 13.2-RELEASE. Since 5-th second, the throughput is limited by sndbuf, 1817Ki / 100ms = 18.2MB/s Interestingly, it's not due to lro_nsegs, but a side effect of https://reviews.freebsd.org/D32693. Namely, the one line change fixed (or vastly improved) the slow-start in 13.x: --- a/usr/src/sys/conf/files 2023-04-06 17:34:41.000000000 -0700 +++ b/usr/src/sys/conf/files 2023-05-02 23:00:38.000000000 -0700 @@ -4412,6 +4412,7 @@ netinet/raw_ip.c optional inet | inet6 netinet/cc/cc.c optional inet | inet6 netinet/cc/cc_newreno.c optional inet | inet6 +netinet/khelp/h_ertt.c optional inet | inet6 netinet/sctp_asconf.c optional inet sctp | inet6 sctp netinet/sctp_auth.c optional inet sctp | inet6 sctp netinet/sctp_bsd_addr.c optional inet sctp | inet6 sctp Here's the tcpdump after compiling netinet/khelp/h_ertt.c into 13.x kernel by default: 0.000 IP src > sink: Flags [S], seq 392582262, win 65535, options [mss 1460,nop,wscale 6,sackOK,TS val 840935345 ecr 0], length 0 0.100 IP sink > src: Flags [S.], seq 3065702766, ack 392582263, win 65160, options [mss 1460,sackOK,TS val 408756323 ecr 840935345,nop,wscale 7], length 0 0.100 IP src > sink: Flags [.], ack 1, win 1027, options [nop,nop,TS val 840935450 ecr 408756323], length 0 // First round-trip: cwnd = 10 * MSS 0.101 IP src > sink: [.], seq 1:14481, ack 1, win 1027, length 14480 0.201 IP sink > src: [.], ack 14481, win 445, length 0 // cwnd += 2 * MSS, but sent two segments, for better RTT calculation 0.201 IP src > sink: [.], seq 14481:15929, ack 1, win 1027, length 1448 0.202 IP src > sink: [.], seq 15929:31857, ack 1, win 1027, length 15928 // cwnd == 12 here // Got ACK for the 1448 segment, cwnd += 1 * MSS, sent two more segs. 0.302 IP sink > src: [.], ack 15929, win 501, length 0 0.302 IP src > sink: [.], seq 31857:33305, ack 1, win 1027, length 1448 0.302 IP src > sink: [.], seq 33305:34753, ack 1, win 1027, length 1448 // cwnd == 13 here // Got ACK for the 15928 segment, cwnd += 2 * MSS, sent 13-MSS segment 0.302 IP sink > src: [.], ack 31857, win 440, length 0 0.302 IP src > sink: [.], seq 34753:53577, ack 1, win 1027, length 18824 // cwnd == 15 here, bytes in flight = 15 * MSS // ACK of 1448 bytes, sent two more segments, typical slow-start 0.403 IP sink > src: [.], ack 33305, win 501, length 0 0.403 IP src > sink: [.], seq 53577:55025, ack 1, win 1027, length 1448 0.403 IP src > sink: [.], seq 55025:56473, ack 1, win 1027, length 1448 // ACK of 1448 bytes, sent 2-MSS segment, typical slow-start with TSO 0.403 IP sink > src: [.], ack 34753, win 496, length 0 0.403 IP src > sink: [.], seq 56473:59369, ack 1, win 1027, length 2896 // cwnd == 17 here // ACK of 18824, cwnd += 2 * MSS, sent 15-MSS segment 0.403 IP sink > src: [.], ack 53577, win 795, length 0 0.403 IP src > sink: [.], seq 59369:81089, ack 1, win 1027, length 21720 // cwnd == 19 here, bytes in flight = 19 * MSS marked_packet_rtt() in h_ertt.c sometimes turns off TSO for better RTT measure, resulting in more segments being sent, and more ACK received, then cwnd could increase faster. It really sounds like a butterfly effect to me. Regards, Shuo On Tue, May 2, 2023 at 3:04 AM Hans Petter Selasky <hps@selasky.org> wrote: > > On 5/2/23 11:14, Hans Petter Selasky wrote: > > Hi Chen! > > > > The FreeBSD mbufs carry the number of ACKs that have been joined > > together into the following field: > > > > m->m_pkthdr.lro_nsegs > > > > Can this value be of any use to cc_newreno ? > > > > --HPS > > Hi Chen, > > Have you tested using FreeBSD main / 14 ? > > The "nsegs" are passed along like this: > > nsegs = max(1, m->m_pkthdr.lro_nsegs); > > ... > > cc_ack_received(tp, th, nsegs, CC_ACK); > > ... > > (Newreno - FreeBSD-14) > > incr = min(ccv->bytes_this_ack, > ccv->nsegs * abc_val * > CCV(ccv, t_maxseg)); > > And in FreeBSD-10 being mentioned in your article: > > (Newreno - FreeBSD-10) > > incr = min(ccv->bytes_this_ack, > V_tcp_abc_l_var * CCV(ccv, t_maxseg)); > > > There is no such thing. > > This issue may already have been fixed! > > --HPS > > > > On 5/2/23 09:46, Chen Shuo wrote: > >> As per newreno_ack_received() in sys/netinet/cc/cc_newreno.c, > >> FreeBSD TCP sender strictly follows RFC 5681 with RFC 3465 extension > >> That is, during slow-start, when receiving an ACK of 'bytes_acked' > >> > >> cwnd += min(bytes_acked, abc_l_var * SMSS); // abc_l_var = 2 dflt > >> > >> As discussed in sec3.2 of RFC 3465, L=2*SMSS bytes exactly balances > >> the negative impact of the delayed ACK algorithm. RFC 5681 also > >> requires that a receiver SHOULD generate an ACK for at least every > >> second full-sized segment, so bytes_acked per ACK is at most 2 * SMSS. > >> If both sender and receiver follow it. cwnd should grow exponentially > >> during slow-slow: > >> > >> cwnd *= 2 (per RTT) > >> > >> However, LRO and TSO are widely used today, so receiver may generate > >> much less ACKs than it used to do. As I observed, Both FreeBSD and > >> Linux generates at most one ACK per segment assembled by LRO/GRO. > >> The worst case is one ACK per 45 MSS, as 45 * 1448 = 65160 < 65535. > >> > >> Sending 1MB over a link of 100ms delay from FreeBSD 13.2: > >> > >> 0.000 IP sender > sink: Flags [S], seq 205083268, win 65535, options > >> [mss 1460,nop,wscale 10,sackOK,TS val 495212525 ecr 0], length 0 > >> 0.100 IP sink > sender: Flags [S.], seq 708257395, ack 205083269, win > >> 65160, options [mss 1460,sackOK,TS val 563185696 ecr > >> 495212525,nop,wscale 7], length 0 > >> 0.100 IP sender > sink: Flags [.], ack 1, win 65, options [nop,nop,TS > >> val 495212626 ecr 563185696], length 0 > >> // TSopt omitted below for brevity. > >> > >> // cwnd = 10 * MSS, sent 10 * MSS > >> 0.101 IP sender > sink: Flags [.], seq 1:14481, ack 1, win 65, > >> length 14480 > >> > >> // got one ACK for 10 * MSS, cwnd += 2 * MSS, sent 12 * MSS > >> 0.201 IP sink > sender: Flags [.], ack 14481, win 427, length 0 > >> 0.201 IP sender > sink: Flags [.], seq 14481:31857, ack 1, win 65, > >> length 17376 > >> > >> // got ACK of 12*MSS above, cwnd += 2 * MSS, sent 14 * MSS > >> 0.301 IP sink > sender: Flags [.], ack 31857, win 411, length 0 > >> 0.301 IP sender > sink: Flags [.], seq 31857:52129, ack 1, win 65, > >> length 20272 > >> > >> // got ACK of 14*MSS above, cwnd += 2 * MSS, sent 16 * MSS > >> 0.402 IP sink > sender: Flags [.], ack 52129, win 395, length 0 > >> 0.402 IP sender > sink: Flags [P.], seq 52129:73629, ack 1, win 65, > >> length 21500 > >> 0.402 IP sender > sink: Flags [.], seq 73629:75077, ack 1, win 65, > >> length 1448 > >> > >> As a consequence, instead of growing exponentially, cwnd grows > >> more-or-less quadratically during slow-start, unless abc_l_var is > >> set to a sufficiently large value. > >> > >> NewReno took more than 20 seconds to ramp up throughput to 100Mbps > >> over an emulated 100ms delay link. While Linux took ~2 seconds. > >> I can provide the pcap file if anyone is interested. > >> > >> Switching to CUBIC won't help, because it uses the logic in NewReno > >> ack_received() for slow start. > >> > >> Is this a well-known issue and abc_l_var is the only cure for it? > >> https://calomel.org/freebsd_network_tuning.html > >> > >> Thank you! > >> > >> Best, > >> Shuo Chen > >> > > > > >