From nobody Sat Apr 08 22:46:39 2023 X-Original-To: freebsd-hackers@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4Pv9My3DStz44yBb for ; Sat, 8 Apr 2023 22:46:54 +0000 (UTC) (envelope-from rpp@ci.com.au) Received: from mippet.ci.com.au (mippet.ci.com.au [192.65.182.30]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4Pv9Mv1R2Hz3jfw for ; Sat, 8 Apr 2023 22:46:50 +0000 (UTC) (envelope-from rpp@ci.com.au) Authentication-Results: mx1.freebsd.org; dkim=pass header.d=ci.com.au header.s=jun2016 header.b=mmb5WZeZ; spf=pass (mx1.freebsd.org: domain of rpp@ci.com.au designates 192.65.182.30 as permitted sender) smtp.mailfrom=rpp@ci.com.au; dmarc=pass (policy=none) header.from=ci.com.au Received: from mippet-dkim.ci.com.au (mippet-dkim.ci.com.au [192.168.1.244]) by mippet-dkim.ci.com.au (8.16.1/8.16.1/CE050417) with ESMTP id 338Mke88052330 for ; Sun, 9 Apr 2023 08:46:40 +1000 (AEST) (envelope-from rpp@ci.com.au) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=ci.com.au; s=jun2016; t=1680994000; bh=StSXIYHa9C4ldr3RwyOQfe2SXSRsSyuHYskuj/EGAao=; h=Date:From:To:Subject:References:In-Reply-To; b=mmb5WZeZ2NAp4ZcQgd3+yZ6zDcYLDx/r5wqF2TjSAb1jXJtkQQmCBNFz5kZT9tqYu K0TOk0SepTkr5hsEysjK8Hwyzg6eq45pXhhnhxi8WjW1diBt48JHeXfTS20/6wVPI5 DyRTNFQ7S64oCbuF99rZYhbA1yGhq7nu6yuo8Fa8xo1dTyxhsmh4UkTX8fb88bEEBX Fsj2PZdSfs5eA5q+0HJaYZPJu6ENRleZFapuJtyULI9JszQy+SdyNskI/YqsTAM+rm w/6EjrCpMs49Q+akgW9JakZSWnNcL6Y9jMWukIaYE0pqiby4OcYQ9zcFFswVZ5gw6z Er+lc8Yv6j50A== Received: from jodi.ci.com.au (jodi.ci.com.au [192.168.1.21]) by mippet.ci.com.au (8.16.1/8.16.1/CE120917) with ESMTPS id 338MkeW0052327 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO) for ; Sun, 9 Apr 2023 08:46:40 +1000 (AEST) (envelope-from rpp@ci.com.au) Received: from jodi.ci.com.au (jodi.ci.com.au [192.168.1.21]) by jodi.ci.com.au (8.17.1/8.17.1) with SMTP id 338MkdYk008892 for ; Sun, 9 Apr 2023 08:46:39 +1000 (AEST) (envelope-from rpp@ci.com.au) Date: Sun, 9 Apr 2023 08:46:39 +1000 From: Richard Perini To: freebsd-hackers@freebsd.org Subject: Re: low TCP speed, wrong rtt measurement Message-ID: References: List-Id: Technical discussions relating to FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-hackers List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-hackers@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spamd-Result: default: False [-4.00 / 15.00]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; NEURAL_HAM_SHORT(-1.00)[-0.999]; DMARC_POLICY_ALLOW(-0.50)[ci.com.au,none]; R_SPF_ALLOW(-0.20)[+mx]; R_DKIM_ALLOW(-0.20)[ci.com.au:s=jun2016]; MIME_GOOD(-0.10)[text/plain]; DKIM_TRACE(0.00)[ci.com.au:+]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; MLMMJ_DEST(0.00)[freebsd-hackers@freebsd.org]; MID_RHS_MATCH_FROMTLD(0.00)[]; ARC_NA(0.00)[]; RCVD_TLS_LAST(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; FROM_HAS_DN(0.00)[]; RCVD_COUNT_THREE(0.00)[4]; PREVIOUSLY_DELIVERED(0.00)[freebsd-hackers@freebsd.org]; RCPT_COUNT_ONE(0.00)[1]; TO_DN_NONE(0.00)[]; ASN(0.00)[asn:9792, ipnet:192.65.182.0/24, country:AU] X-Rspamd-Queue-Id: 4Pv9Mv1R2Hz3jfw X-Spamd-Bar: --- X-ThisMailContainsUnwantedMimeParts: N On Tue, Apr 04, 2023 at 02:46:34PM -0000, Peter 'PMc' Much wrote: > ** maybe this should rather go the -net list, but then > ** there are only bug messages > > Hi, > I'm trying to transfer backup data via WAN; the link bandwidth is > only ~2 Mbit, but this can well run for days and just saturate the spare > bandwidth. > > The problem is, it doesn't saturate the bandwidth. > > I found that the backup application opens the socket in this way: > if ((fd = socket(ipaddr->GetFamily(), SOCK_STREAM, 0)) < 0) { > > Apparently that doesn't work well. So I patched the application to do > it this way: > - if ((fd = socket(ipaddr->GetFamily(), SOCK_STREAM, 0)) < 0) { > + if ((fd = socket(ipaddr->GetFamily(), SOCK_STREAM, IPPROTO_TCP)) < 0) { > > The result, observed with tcpdump, was now noticeably different, but > rather worse than better. > > I tried various cc algorithms, all behaved very bad with the exception > of cc_vegas. Vegas, after tuning the alpha and beta, gave satisfying > results with less than 1% tradeoff. > > But only for a time. After transferring for a couple of hours the > throughput went bad again: > > # netstat -aC > Proto Recv-Q Send-Q Local Address Foreign Address (state) CC cwin ssthresh MSS ECN > tcp6 0 57351 edge-jo.26996 pole-n.22 ESTABLISHED vegas 22203 10392 1311 off > tcp4 0 106305 edge-e.62275 pole-n.bacula-sd ESTABLISHED vegas 11943 5276 1331 off > > The first connection is freshly created. The second one runs for a day > already , and it is obviousely hosed - it doesn't recover. > > # sysctl net.inet.tcp.cc.vegas > net.inet.tcp.cc.vegas.beta: 14 > net.inet.tcp.cc.vegas.alpha: 8 > > 8 (alpha) x 1331 (mss) = 10648 > > The cwin is adjusted to precisely one tick above the alpha, and > doesn't rise further. (Increasing the alpha further does solve the > issue for this connection - but that is not how things are supposed to > work.) > > Now I tried to look into the data that vegas would use for it's > decisions, and found this: > > # dtrace -n 'fbt:kernel:vegas_ack_received:entry { printf("%s %u %d %d %d %d", execname,\ > (*((struct tcpcb **)(arg0+24)))->snd_cwnd,\ > ((struct ertt *)((*((struct tcpcb **)(arg0+24)))->osd->osd_slots[0]))->minrtt,\ > ((struct ertt *)((*((struct tcpcb **)(arg0+24)))->osd->osd_slots[0]))->marked_snd_cwnd,\ > ((struct ertt *)((*((struct tcpcb **)(arg0+24)))->osd->osd_slots[0]))->bytes_tx_in_marked_rtt,\ > ((struct ertt *)((*((struct tcpcb **)(arg0+24)))->osd->osd_slots[0]))->markedpkt_rtt);\ > }' > CPU ID FUNCTION:NAME > 6 17478 vegas_ack_received:entry ng_queue 11943 1 11943 10552 131 > 17 17478 vegas_ack_received:entry ng_queue 22203 56 22203 20784 261 > 17 17478 vegas_ack_received:entry ng_queue 22203 56 22203 20784 261 > 3 17478 vegas_ack_received:entry ng_queue 11943 1 11943 10552 131 > 5 17478 vegas_ack_received:entry ng_queue 22203 56 22203 20784 261 > 17 17478 vegas_ack_received:entry ng_queue 11943 1 11943 10552 131 > 11 17478 vegas_ack_received:entry ng_queue 11943 1 11943 10552 106 > 15 17478 vegas_ack_received:entry ng_queue 22203 56 22203 20784 261 > 13 17478 vegas_ack_received:entry ng_queue 22203 56 22203 20784 261 > 16 17478 vegas_ack_received:entry ng_queue 11943 1 11943 10552 106 > 3 17478 vegas_ack_received:entry ng_queue 22203 56 22203 20784 261 > > One can see that the "minrtt" value for the freshly created connection > is 56 (which is very plausible). > But the old and hosed connection shows minrtt = 1, which explains the > observed cwin. > > The minrtt gets calculated in sys/netinet/khelp/h_ertt.c: > e_t->rtt = tcp_ts_getticks() - txsi->tx_ts + 1; > There is a "+1", so this was apparently zero. > > But source and destination are at least 1000 km apart. So either we > have had one of the rare occasions of hyperspace tunnelling, or > something is going wrong in the ertt measurement code. > > For now this is a one-time observation, but it might also explain why > the other cc algorithms behaved badly. These algorithms are widely in > use and should work - the ertt measurement however is the same for all of > them. I can confirm I am seeing similar problems transferring files to our various production sites around Australia. Various types/sizes of links and bandwidths. I can saturate the nearby links, but the link utilisation/saturation decreases with distance. I've tried various transfer protocols: ftp, scp, rcp, http: results are similar for all. Ping times for the closest WAN link is 2.3ms, furthest is 60ms. On the furthest link, we get around 15% utilisation. Transfer between 2 Windows hosts on the furthest link yields ~80% utilisation. FreeBSD versions involved are 12.1 and 12.2. -- Richard Perini Ramico Australia Pty Ltd Sydney, Australia rpp@ci.com.au +61 2 9552 5500 ----------------------------------------------------------------------------- "The difference between theory and practice is that in theory there is no difference, but in practice there is"