low TCP speed, wrong rtt measurement
- Reply: Richard Perini : "Re: low TCP speed, wrong rtt measurement"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Tue, 04 Apr 2023 14:46:34 UTC
** maybe this should rather go the -net list, but then ** there are only bug messages Hi, I'm trying to transfer backup data via WAN; the link bandwidth is only ~2 Mbit, but this can well run for days and just saturate the spare bandwidth. The problem is, it doesn't saturate the bandwidth. I found that the backup application opens the socket in this way: if ((fd = socket(ipaddr->GetFamily(), SOCK_STREAM, 0)) < 0) { Apparently that doesn't work well. So I patched the application to do it this way: - if ((fd = socket(ipaddr->GetFamily(), SOCK_STREAM, 0)) < 0) { + if ((fd = socket(ipaddr->GetFamily(), SOCK_STREAM, IPPROTO_TCP)) < 0) { The result, observed with tcpdump, was now noticeably different, but rather worse than better. I tried various cc algorithms, all behaved very bad with the exception of cc_vegas. Vegas, after tuning the alpha and beta, gave satisfying results with less than 1% tradeoff. But only for a time. After transferring for a couple of hours the throughput went bad again: # netstat -aC Proto Recv-Q Send-Q Local Address Foreign Address (state) CC cwin ssthresh MSS ECN tcp6 0 57351 edge-jo.26996 pole-n.22 ESTABLISHED vegas 22203 10392 1311 off tcp4 0 106305 edge-e.62275 pole-n.bacula-sd ESTABLISHED vegas 11943 5276 1331 off The first connection is freshly created. The second one runs for a day already , and it is obviousely hosed - it doesn't recover. # sysctl net.inet.tcp.cc.vegas net.inet.tcp.cc.vegas.beta: 14 net.inet.tcp.cc.vegas.alpha: 8 8 (alpha) x 1331 (mss) = 10648 The cwin is adjusted to precisely one tick above the alpha, and doesn't rise further. (Increasing the alpha further does solve the issue for this connection - but that is not how things are supposed to work.) Now I tried to look into the data that vegas would use for it's decisions, and found this: # dtrace -n 'fbt:kernel:vegas_ack_received:entry { printf("%s %u %d %d %d %d", execname,\ (*((struct tcpcb **)(arg0+24)))->snd_cwnd,\ ((struct ertt *)((*((struct tcpcb **)(arg0+24)))->osd->osd_slots[0]))->minrtt,\ ((struct ertt *)((*((struct tcpcb **)(arg0+24)))->osd->osd_slots[0]))->marked_snd_cwnd,\ ((struct ertt *)((*((struct tcpcb **)(arg0+24)))->osd->osd_slots[0]))->bytes_tx_in_marked_rtt,\ ((struct ertt *)((*((struct tcpcb **)(arg0+24)))->osd->osd_slots[0]))->markedpkt_rtt);\ }' CPU ID FUNCTION:NAME 6 17478 vegas_ack_received:entry ng_queue 11943 1 11943 10552 131 17 17478 vegas_ack_received:entry ng_queue 22203 56 22203 20784 261 17 17478 vegas_ack_received:entry ng_queue 22203 56 22203 20784 261 3 17478 vegas_ack_received:entry ng_queue 11943 1 11943 10552 131 5 17478 vegas_ack_received:entry ng_queue 22203 56 22203 20784 261 17 17478 vegas_ack_received:entry ng_queue 11943 1 11943 10552 131 11 17478 vegas_ack_received:entry ng_queue 11943 1 11943 10552 106 15 17478 vegas_ack_received:entry ng_queue 22203 56 22203 20784 261 13 17478 vegas_ack_received:entry ng_queue 22203 56 22203 20784 261 16 17478 vegas_ack_received:entry ng_queue 11943 1 11943 10552 106 3 17478 vegas_ack_received:entry ng_queue 22203 56 22203 20784 261 One can see that the "minrtt" value for the freshly created connection is 56 (which is very plausible). But the old and hosed connection shows minrtt = 1, which explains the observed cwin. The minrtt gets calculated in sys/netinet/khelp/h_ertt.c: e_t->rtt = tcp_ts_getticks() - txsi->tx_ts + 1; There is a "+1", so this was apparently zero. But source and destination are at least 1000 km apart. So either we have had one of the rare occasions of hyperspace tunnelling, or something is going wrong in the ertt measurement code. For now this is a one-time observation, but it might also explain why the other cc algorithms behaved badly. These algorithms are widely in use and should work - the ertt measurement however is the same for all of them.