From nobody Wed May 03 06:49:45 2023 X-Original-To: freebsd-net@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4QB6ym50VYz490jC for ; Wed, 3 May 2023 06:50:24 +0000 (UTC) (envelope-from chenshuo@chenshuo.com) Received: from mail-pl1-x635.google.com (mail-pl1-x635.google.com [IPv6:2607:f8b0:4864:20::635]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4QB6ym1h4Gz4Rxr for ; Wed, 3 May 2023 06:50:24 +0000 (UTC) (envelope-from chenshuo@chenshuo.com) Authentication-Results: mx1.freebsd.org; none Received: by mail-pl1-x635.google.com with SMTP id d9443c01a7336-1aaea3909d1so35986355ad.2 for ; Tue, 02 May 2023 23:50:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chenshuo-com.20221208.gappssmtp.com; s=20221208; t=1683096623; x=1685688623; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=iOoQkKN5YEEmNs4BWm5MvWRqPI1nEz5tthiC1hVqzmQ=; b=FuetSUHV6Wj+APoqmQzYdc+B3ZFO1/UBzTeecAC7NFPL0NHhHei6YXh8aOycHXO9IP YAEi0VME73+3wyp124VPP9FnzIWBL9EE33ucmskL75hCqDnsuVGQ7A6JtrMuMUEFSaLT OldFsBLtY0MMpyX+m3Pf232A5woJauGwkcjijduHwsbMNjKhtIvSATNXzzxE94IOJ0So DA5bVJROmZkoQCcQLvxSH4OBjeqOT6q0eeBgEk4SHwudRDASE10rsDZHgKlom5gL8EyE VP+HpM4hnC1XlNwhIxbi3uIp0Wki440F4Lh+Qp6q90yqkmYnW8aDOvA4pdgBaAV1hRcm SKgg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1683096623; x=1685688623; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=iOoQkKN5YEEmNs4BWm5MvWRqPI1nEz5tthiC1hVqzmQ=; b=Eb7V+zOROlTmyJD/7m6rF1sYzCqL/v7Jpbw51cVWW4NjkZg40ZUkiLsHKTbmku3Jbw nMTdzufoBYVzdH0Z5xG7vDv7jiUFEonfReOaFm+cdLBT+I99Ze9t77wJnIjcd+PU5gIO Ig7HoIxlg36n7iXn8+0PEeCK5idAJyBnAcGfwSG6JgL2Fuzyx7LtbOmMP9YQdiajzpI1 Phs1Tuz48mvn88+GyX1U1AsyOG2a7vagtB6yIPSkXLrE8KFZslD72eKUHlteErpSImSO nkk0nbTtEGar4UqHdKh9G6UhrRR4DrFzKutnaNrriDC7fpmuKAVFaNpmg5apUL1Mu/hk bg2Q== X-Gm-Message-State: AC+VfDxMAx4w3LsCDtW/qlZnNn5683+JHKU5heOXir/zjiRxus8IuljO FvTvRzacoV343eQUIqjuSFf7fZ3awCvC6QM0HCr9bEy/IjrzsNYsMYOjtQ== X-Google-Smtp-Source: ACHHUZ5qOyM9qlt/RTIWcdDoVY1nILPBCZe/4TbUwmSVMRqqQfrDq2Dlb8WkrrnWU4JwQRHQdWC3tVoC32PNgwWnT+8= X-Received: by 2002:a17:902:cec3:b0:1a6:ce48:5700 with SMTP id d3-20020a170902cec300b001a6ce485700mr1385372plg.33.1683096621372; Tue, 02 May 2023 23:50:21 -0700 (PDT) List-Id: Networking and TCP/IP with FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-net List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-net@freebsd.org MIME-Version: 1.0 References: <656f2daa-53a2-40d2-5fdc-b570473d56bc@selasky.org> In-Reply-To: <656f2daa-53a2-40d2-5fdc-b570473d56bc@selasky.org> From: Chen Shuo Date: Tue, 2 May 2023 23:49:45 -0700 Message-ID: Subject: Re: Cwnd grows slowly during slow-start due to LRO of the receiver side. To: Hans Petter Selasky Cc: freebsd-net@freebsd.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 4QB6ym1h4Gz4Rxr X-Spamd-Bar: ---- X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US] X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-ThisMailContainsUnwantedMimeParts: N Hi Hans, Thanks for replying and suggestions. > Have you tested using FreeBSD main / 14 ? I tested 14.0-CURRENT built on 2023-04-27, it is indeed much improved. Now the TCP sender reaches 100Mbps in 4 seconds on a link with 100ms delay. % uname -a FreeBSD 14.0-CURRENT #0 main-n262599-60167184abd5: Thu Apr 27 08:09:50 UTC = 2023 schen@freebsd14:~/recipes/tpc % bin/tcpperf -c 192.168.0.1 -t 6 Connected 192.168.0.100:59302 -> 192.168.0.1:2009, congestion control: cubi= c Time (s) Throughput Bitrate Cwnd Rwnd sndbuf ssthresh rtt/var 0.000s 0.00kB/s 0.00kbps 14.1Ki 63.6Ki 32.8Ki 1024Mi 97.8ms/25= 00 1.014s 776kB/s 6205kbps 166Ki 992Ki 313Ki 1024Mi 100.0ms/1= 875 2.021s 3643kB/s 29.1Mbps 495Ki 1491Ki 1017Ki 1024Mi 100.0ms/1= 875 3.029s 7544kB/s 60.3Mbps 932Ki 2096Ki 1817Ki 1024Mi 100.0ms/1= 875 4.036s 12.9MB/s 103Mbps 1729Ki 3064Ki 1817Ki 1024Mi 100.0ms/1= 875 5.046s 18.2MB/s 145Mbps 2606Ki 3056Ki 1817Ki 1024Mi 96.9ms/68= 75 6.090s 17.8MB/s 143Mbps 3074Ki 2974Ki 1817Ki 1024Mi 113.4ms/1= 1250 Sender transferred 62.0MBytes in 6.090s, throughput: 10.2MBytes/s, 81.4Mb= its/s Receiver transferred 62.0MBytes in 6.191s, throughput: 10.0MBytes/s, 80.1Mb= its/s Cwnd increased much faster than 13.2-RELEASE. Since 5-th second, the throughput is limited by sndbuf, 1817Ki / 100ms =3D 18.2MB/s Interestingly, it's not due to lro_nsegs, but a side effect of https://reviews.freebsd.org/D32693. Namely, the one line change fixed (or vastly improved) the slow-start in 13= .x: --- a/usr/src/sys/conf/files 2023-04-06 17:34:41.000000000 -0700 +++ b/usr/src/sys/conf/files 2023-05-02 23:00:38.000000000 -0700 @@ -4412,6 +4412,7 @@ netinet/raw_ip.c optional inet | inet6 netinet/cc/cc.c optional inet | inet6 netinet/cc/cc_newreno.c optional inet | inet6 +netinet/khelp/h_ertt.c optional inet | inet6 netinet/sctp_asconf.c optional inet sctp | inet6 sctp netinet/sctp_auth.c optional inet sctp | inet6 sctp netinet/sctp_bsd_addr.c optional inet sctp | inet6 sctp Here's the tcpdump after compiling netinet/khelp/h_ertt.c into 13.x kernel by default: 0.000 IP src > sink: Flags [S], seq 392582262, win 65535, options [mss 1460,nop,wscale 6,sackOK,TS val 840935345 ecr 0], length 0 0.100 IP sink > src: Flags [S.], seq 3065702766, ack 392582263, win 65160, options [mss 1460,sackOK,TS val 408756323 ecr 840935345,nop,wscale 7], length 0 0.100 IP src > sink: Flags [.], ack 1, win 1027, options [nop,nop,TS val 840935450 ecr 408756323], length 0 // First round-trip: cwnd =3D 10 * MSS 0.101 IP src > sink: [.], seq 1:14481, ack 1, win 1027, length 14480 0.201 IP sink > src: [.], ack 14481, win 445, length 0 // cwnd +=3D 2 * MSS, but sent two segments, for better RTT calculation 0.201 IP src > sink: [.], seq 14481:15929, ack 1, win 1027, length 1448 0.202 IP src > sink: [.], seq 15929:31857, ack 1, win 1027, length 15928 // cwnd =3D=3D 12 here // Got ACK for the 1448 segment, cwnd +=3D 1 * MSS, sent two more segs. 0.302 IP sink > src: [.], ack 15929, win 501, length 0 0.302 IP src > sink: [.], seq 31857:33305, ack 1, win 1027, length 1448 0.302 IP src > sink: [.], seq 33305:34753, ack 1, win 1027, length 1448 // cwnd =3D=3D 13 here // Got ACK for the 15928 segment, cwnd +=3D 2 * MSS, sent 13-MSS segment 0.302 IP sink > src: [.], ack 31857, win 440, length 0 0.302 IP src > sink: [.], seq 34753:53577, ack 1, win 1027, length 18824 // cwnd =3D=3D 15 here, bytes in flight =3D 15 * MSS // ACK of 1448 bytes, sent two more segments, typical slow-start 0.403 IP sink > src: [.], ack 33305, win 501, length 0 0.403 IP src > sink: [.], seq 53577:55025, ack 1, win 1027, length 1448 0.403 IP src > sink: [.], seq 55025:56473, ack 1, win 1027, length 1448 // ACK of 1448 bytes, sent 2-MSS segment, typical slow-start with TSO 0.403 IP sink > src: [.], ack 34753, win 496, length 0 0.403 IP src > sink: [.], seq 56473:59369, ack 1, win 1027, length 2896 // cwnd =3D=3D 17 here // ACK of 18824, cwnd +=3D 2 * MSS, sent 15-MSS segment 0.403 IP sink > src: [.], ack 53577, win 795, length 0 0.403 IP src > sink: [.], seq 59369:81089, ack 1, win 1027, length 21720 // cwnd =3D=3D 19 here, bytes in flight =3D 19 * MSS marked_packet_rtt() in h_ertt.c sometimes turns off TSO for better RTT meas= ure, resulting in more segments being sent, and more ACK received, then cwnd could increase faster. It really sounds like a butterfly effect to me. Regards, Shuo On Tue, May 2, 2023 at 3:04=E2=80=AFAM Hans Petter Selasky wrote: > > On 5/2/23 11:14, Hans Petter Selasky wrote: > > Hi Chen! > > > > The FreeBSD mbufs carry the number of ACKs that have been joined > > together into the following field: > > > > m->m_pkthdr.lro_nsegs > > > > Can this value be of any use to cc_newreno ? > > > > --HPS > > Hi Chen, > > Have you tested using FreeBSD main / 14 ? > > The "nsegs" are passed along like this: > > nsegs =3D max(1, m->m_pkthdr.lro_nsegs); > > ... > > cc_ack_received(tp, th, nsegs, CC_ACK); > > ... > > (Newreno - FreeBSD-14) > > incr =3D min(ccv->bytes_this_ack, > ccv->nsegs * abc_val * > CCV(ccv, t_maxseg)); > > And in FreeBSD-10 being mentioned in your article: > > (Newreno - FreeBSD-10) > > incr =3D min(ccv->bytes_this_ack, > V_tcp_abc_l_var * CCV(ccv, t_maxseg)= ); > > > There is no such thing. > > This issue may already have been fixed! > > --HPS > > > > On 5/2/23 09:46, Chen Shuo wrote: > >> As per newreno_ack_received() in sys/netinet/cc/cc_newreno.c, > >> FreeBSD TCP sender strictly follows RFC 5681 with RFC 3465 extension > >> That is, during slow-start, when receiving an ACK of 'bytes_acked' > >> > >> cwnd +=3D min(bytes_acked, abc_l_var * SMSS); // abc_l_var =3D 2= dflt > >> > >> As discussed in sec3.2 of RFC 3465, L=3D2*SMSS bytes exactly balances > >> the negative impact of the delayed ACK algorithm. RFC 5681 also > >> requires that a receiver SHOULD generate an ACK for at least every > >> second full-sized segment, so bytes_acked per ACK is at most 2 * SMSS. > >> If both sender and receiver follow it. cwnd should grow exponentially > >> during slow-slow: > >> > >> cwnd *=3D 2 (per RTT) > >> > >> However, LRO and TSO are widely used today, so receiver may generate > >> much less ACKs than it used to do. As I observed, Both FreeBSD and > >> Linux generates at most one ACK per segment assembled by LRO/GRO. > >> The worst case is one ACK per 45 MSS, as 45 * 1448 =3D 65160 < 65535. > >> > >> Sending 1MB over a link of 100ms delay from FreeBSD 13.2: > >> > >> 0.000 IP sender > sink: Flags [S], seq 205083268, win 65535, options > >> [mss 1460,nop,wscale 10,sackOK,TS val 495212525 ecr 0], length 0 > >> 0.100 IP sink > sender: Flags [S.], seq 708257395, ack 205083269, wi= n > >> 65160, options [mss 1460,sackOK,TS val 563185696 ecr > >> 495212525,nop,wscale 7], length 0 > >> 0.100 IP sender > sink: Flags [.], ack 1, win 65, options [nop,nop,T= S > >> val 495212626 ecr 563185696], length 0 > >> // TSopt omitted below for brevity. > >> > >> // cwnd =3D 10 * MSS, sent 10 * MSS > >> 0.101 IP sender > sink: Flags [.], seq 1:14481, ack 1, win 65, > >> length 14480 > >> > >> // got one ACK for 10 * MSS, cwnd +=3D 2 * MSS, sent 12 * MSS > >> 0.201 IP sink > sender: Flags [.], ack 14481, win 427, length 0 > >> 0.201 IP sender > sink: Flags [.], seq 14481:31857, ack 1, win 65, > >> length 17376 > >> > >> // got ACK of 12*MSS above, cwnd +=3D 2 * MSS, sent 14 * MSS > >> 0.301 IP sink > sender: Flags [.], ack 31857, win 411, length 0 > >> 0.301 IP sender > sink: Flags [.], seq 31857:52129, ack 1, win 65, > >> length 20272 > >> > >> // got ACK of 14*MSS above, cwnd +=3D 2 * MSS, sent 16 * MSS > >> 0.402 IP sink > sender: Flags [.], ack 52129, win 395, length 0 > >> 0.402 IP sender > sink: Flags [P.], seq 52129:73629, ack 1, win 65, > >> length 21500 > >> 0.402 IP sender > sink: Flags [.], seq 73629:75077, ack 1, win 65, > >> length 1448 > >> > >> As a consequence, instead of growing exponentially, cwnd grows > >> more-or-less quadratically during slow-start, unless abc_l_var is > >> set to a sufficiently large value. > >> > >> NewReno took more than 20 seconds to ramp up throughput to 100Mbps > >> over an emulated 100ms delay link. While Linux took ~2 seconds. > >> I can provide the pcap file if anyone is interested. > >> > >> Switching to CUBIC won't help, because it uses the logic in NewReno > >> ack_received() for slow start. > >> > >> Is this a well-known issue and abc_l_var is the only cure for it? > >> https://calomel.org/freebsd_network_tuning.html > >> > >> Thank you! > >> > >> Best, > >> Shuo Chen > >> > > > > >