Re: low TCP speed, wrong rtt measurement

From: Scheffenegger, Richard <rscheff_at_freebsd.org>
Date: Wed, 07 Jun 2023 20:23:09 UTC
Hi Peter,


Your annotation may match what the ertt module is capturing, but it's
obviously wrong.

Look at the TSval of the data segment, vs the TSecr of the ACK:


  > Here we sent.
  >
  > 18:44:58.090981 IP 192.168.97.17.24379 > 192.168.99.1.9103: Flags [.], seq 1271913561:1271915009, ack 0, win 1027, options [nop,nop,TS val *2751965353* ecr 3960840091], length 1448
  >
  > And here we got ACK. This is actually 7.5 ms, well, near enough to 6.
  >
  > 18:44:58.107088 IP 192.168.99.1.9103 > 192.168.97.17.24379: Flags [.], ack 1271919353, win 1450, options [nop,nop,TS val 3960840107 ecr *2751965293*], length 0

The hole does not close because of this SACK retransmission, but because
of a prior (probably the original?) transmission, which probably was simply reordered.

  > 18:44:58.033070 IP 192.168.97.17.24379 > 192.168.99.1.9103: Flags [.], seq 1271913561:1271915009, ack 0, win 1027, options [nop,nop,TS val 2751965293 ecr 3960840033], length 1448

The timestamp (assuming millisecond granularity) appears to indicate
around 107 - 33  ~= 74ms RTT (but the monotonic increasing number here
could be anything, under full control of the sender; while most stacks
currently map milliseconds, this is not a save bet to make)

The packet capture here also indicate 74,018 ms.... (.107088 - .033070)

I'ven't looked into the ertt module ever, but it seems here that no
validation with all available information is done for some reason...

Or maybe it only fails in peculiar circumstances...


-----Original Message-----



From: Peter 'PMc' Much <pmc@citylink.dinoex.sub.org>
Subject: Re: low TCP speed, wrong rtt measurement


** Original to freebsd-hackers@FreeBSD.org **


Folks,

    I can now say for certain that there is a flaw in the rtt measurements.

On 2023-04-09, Scheffenegger, Richard <rscheff@freebsd.org> wrote:
  > Finally, you could be using SIFTR to track the evolution of the minrtt
  > value over the course of the session.

  > Although I suspect ultimately a tcpdump including the tcp header (-s
  > 80) , and the sifter internal state evolution would be optimal to
  > understanding when and why the RTT values go off the rails.

  > At first glance, the ertt module may be prone to miscalculations, when
  > retransmissions are in play - no special precautions appear to be
  > present, to distinguish between the originally sent packet, and any
  > retransmission, nor any filtering of ACKs which come in as duplicates.
  > Thus there could be a scenario, where an ACK for a spurious
  > retransmission, e.g. due to reordering, could lead to a wrong baseline
  > RTT measurement, which is physically impossible on such a long
  > distance connection...

Richard, thanks for Your comments. It looks like You are right, and Your
debug advice sounds interesting - but for now I went a different way:

I put timestamping into the h_ertt code and reset the minrtt measurement
after an hour. I think this is a good idea in any case, as physics might
switch during a connection (only the way I did it, straightforward
inline comparing the timestamp at every occasion, is not really good
performance-wise), and mainly, it does already relieve my problem a bit.

On my May upload, the flaw did reproduce. Then I put in that patch.
Now on the June upload, the flaw does reproduce again, and the patch
fixes it after some time. Here is the evidence:

tcp4       0 123692 192.168.97.17.24379    192.168.99.1.9103
ESTABLISHED vegas      14552       7240  1460 off
tcp4       0 123692 192.168.97.17.24379    192.168.99.1.9103
ESTABLISHED vegas      14552       7240  1460 off
tcp4       0 123692 192.168.97.17.24379    192.168.99.1.9103
ESTABLISHED vegas       8712       7240  1460 off
tcp4       0 123692 192.168.97.17.24379    192.168.99.1.9103
ESTABLISHED vegas      14552       7240  1460 off
tcp4       0 123692 192.168.97.17.24379    192.168.99.1.9103
ESTABLISHED vegas      14552       7240  1460 off
tcp4       0 123692 192.168.97.17.24379    192.168.99.1.9103
ESTABLISHED vegas      14552       7240  1460 off
tcp4       0 123692 192.168.97.17.24379    192.168.99.1.9103
ESTABLISHED vegas      14552       7240  1460 off
tcp4       0 123299 192.168.97.17.24379    192.168.99.1.9103
ESTABLISHED vegas      14552       7240  1460 off
tcp4       0 120796 192.168.97.17.24379    192.168.99.1.9103
ESTABLISHED vegas      14552       7240  1460 off
tcp4       0 123692 192.168.97.17.24379    192.168.99.1.9103
ESTABLISHED vegas      14552       7240  1460 off
tcp4       0 123692 192.168.97.17.24379    192.168.99.1.9103
ESTABLISHED vegas      14552       7240  1460 off
tcp4       0 123692 192.168.97.17.24379    192.168.99.1.9103
ESTABLISHED vegas      14552       7240  1460 off
tcp4       0 123692 192.168.97.17.24379    192.168.99.1.9103
ESTABLISHED vegas      14552       7240  1460 off

The cwin appears to be locked at 14552 and doesn't go higher.
And then the patch does it's action:

Jun  7 15:28:47 <kern.crit> edge kernel: [72317] h_ertt reset minrtt
1686087605, was 11 since 1686140926

This runs some 500 miles plus tunnels&filters, so rtt 11 is impossible
here. After resetting it, the cwin is normal again:

tcp4       0 123692 192.168.97.17.24379    192.168.99.1.9103 ESTABLISHED vegas      20368      10136  1460 off
tcp4       0 123692 192.168.97.17.24379    192.168.99.1.9103 ESTABLISHED vegas      20368      10136  1460 off
tcp4       0 123692 192.168.97.17.24379    192.168.99.1.9103 ESTABLISHED vegas      14528      10136  1460 off
tcp4       0 123692 192.168.97.17.24379    192.168.99.1.9103 ESTABLISHED vegas      17448      10136  1460 off
tcp4       0 123381 192.168.97.17.24379    192.168.99.1.9103 ESTABLISHED vegas      10160       8688  1460 off
tcp4       0 122117 192.168.97.17.24379    192.168.99.1.9103 ESTABLISHED vegas      14540       8688  1460 off
tcp4       0 123692 192.168.97.17.24379    192.168.99.1.9103 ESTABLISHED vegas      20380       8688  1460 off
tcp4       0 123692 192.168.97.17.24379    192.168.99.1.9103 ESTABLISHED vegas      20380       8688  1460 off
tcp4       0 123692 192.168.97.17.24379    192.168.99.1.9103 ESTABLISHED vegas      15988      10136  1460 off
tcp4       0 123692 192.168.97.17.24379    192.168.99.1.9103 ESTABLISHED vegas      18908      10136  1460 off
tcp4       0 123692 192.168.97.17.24379    192.168.99.1.9103 ESTABLISHED vegas      20368      10136  1460 off
tcp4       0 123692 192.168.97.17.24379    192.168.99.1.9103 ESTABLISHED vegas      20368      10136  1460 off
tcp4       0 123692 192.168.97.17.24379    192.168.99.1.9103 ESTABLISHED vegas      21828      10136  1460 off
tcp4       0 113559 192.168.97.17.24379    192.168.99.1.9103 ESTABLISHED vegas      21828      10136  1460 off
tcp4       0 123692 192.168.97.17.24379    192.168.99.1.9103 ESTABLISHED vegas      23288      10136  1460 off
tcp4       0 123692 192.168.97.17.24379    192.168.99.1.9103 ESTABLISHED vegas      23288      10136  1460 off

So now, with timestamps provided, it makes sense to look into the
tcpdump. The next occasion occurs:

Jun  7 19:44:59 <kern.crit> edge kernel: [87689] h_ertt reset minrtt 1686087605, was 6 since 1686156298

18:44:56.561344 IP 192.168.99.1.9103 > 192.168.97.17.24379: Flags [.], ack 1271901977, win 1518, options [nop,nop,TS val 3960838455 ecr 2751963567], length 0
18:44:56.561438 IP 192.168.97.17.24379 > 192.168.99.1.9103: Flags [.], seq 1271913561:1271915009, ack 0, win 1027, options [nop,nop,TS val 2751963822 ecr 3960838455], length 1448
18:44:56.561446 IP 192.168.97.17.24379 > 192.168.99.1.9103: Flags [.], seq 1271915009:1271916457, ack 0, win 1027, options [nop,nop,TS val 2751963822 ecr 3960838455], length 1448
18:44:56.833069 IP 192.168.99.1.9103 > 192.168.97.17.24379: Flags [.], ack 1271904873, win 1540, options [nop,nop,TS val 3960838833 ecr 2751963582,nop,nop,sack 2 {1271915009:1271916457}{1271909217:1271910665}], length 0
18:44:56.833166 IP 192.168.97.17.24379 > 192.168.99.1.9103: Flags [.], seq 1271916457:1271917905, ack 0, win 1027, options [nop,nop,TS val 2751964093 ecr 3960838833], length 1448
18:44:56.833175 IP 192.168.97.17.24379 > 192.168.99.1.9103: Flags [.], seq 1271917905:1271919353, ack 0, win 1027, options [nop,nop,TS val 2751964093 ecr 3960838833], length 1448
18:44:56.924990 IP 192.168.99.1.9103 > 192.168.97.17.24379: Flags [.], ack 1271904873, win 1540, options [nop,nop,TS val 3960838924 ecr 2751963582,nop,nop,sack 2 {1271915009:1271917905}{1271909217:1271910665}], length 0
18:44:56.926649 IP 192.168.99.1.9103 > 192.168.97.17.24379: Flags [.], ack 1271904873, win 1540, options [nop,nop,TS val 3960838926 ecr 2751963582,nop,nop,sack 2 {1271915009:1271919353}{1271909217:1271910665}], length 0
18:44:56.926731 IP 192.168.97.17.24379 > 192.168.99.1.9103: Flags [.], seq 1271904873:1271906321, ack 0, win 1027, options [nop,nop,TS val 2751964187 ecr 3960838926], length 1448
18:44:57.939383 IP 192.168.97.17.24379 > 192.168.99.1.9103: Flags [.], seq 1271904873:1271906321, ack 0, win 1027, options [nop,nop,TS val 2751965203 ecr 3960838926], length 1448
18:44:57.988205 IP 192.168.99.1.9103 > 192.168.97.17.24379: Flags [.], ack 1271906321, win 1518, options [nop,nop,TS val 3960839988 ecr 2751965203,nop,nop,sack 2 {1271915009:1271919353}{1271909217:1271910665}], length 0
18:44:57.988316 IP 192.168.97.17.24379 > 192.168.99.1.9103: Flags [.], seq 1271906321:1271907769, ack 0, win 1027, options [nop,nop,TS val 2751965248 ecr 3960839988], length 1448
18:44:57.988326 IP 192.168.97.17.24379 > 192.168.99.1.9103: Flags [.], seq 1271907769:1271909217, ack 0, win 1027, options [nop,nop,TS val 2751965248 ecr 3960839988], length 1448
18:44:58.032928 IP 192.168.99.1.9103 > 192.168.97.17.24379: Flags [.], ack 1271907769, win 1518, options [nop,nop,TS val 3960840033 ecr 2751965248,nop,nop,sack 2 {1271915009:1271919353}{1271909217:1271910665}], length 0
18:44:58.033056 IP 192.168.97.17.24379 > 192.168.99.1.9103: Flags [.], seq 1271907769:1271909217, ack 0, win 1027, options [nop,nop,TS val 2751965293 ecr 3960840033], length 1448
18:44:58.033065 IP 192.168.97.17.24379 > 192.168.99.1.9103: Flags [.], seq 1271910665:1271912113, ack 0, win 1027, options [nop,nop,TS val 2751965293 ecr 3960840033], length 1448
18:44:58.033067 IP 192.168.97.17.24379 > 192.168.99.1.9103: Flags [.], seq 1271912113:1271913561, ack 0, win 1027, options [nop,nop,TS val 2751965293 ecr 3960840033], length 1448
18:44:58.033070 IP 192.168.97.17.24379 > 192.168.99.1.9103: Flags [.], seq 1271913561:1271915009, ack 0, win 1027, options [nop,nop,TS val 2751965293 ecr 3960840033], length 1448
18:44:58.042332 IP 192.168.99.1.9103 > 192.168.97.17.24379: Flags [.], ack 1271910665, win 1495, options [nop,nop,TS val 3960840042 ecr 2751965248,nop,nop,sack 1 {1271915009:1271919353}], length 0
18:44:58.077990 IP 192.168.99.1.9103 > 192.168.97.17.24379: Flags [.], ack 1271910665, win 1540, options [nop,nop,TS val 3960840078 ecr 2751965248,nop,nop,sack 2 {1271907769:1271909217}{1271915009:1271919353}], length 0
18:44:58.090877 IP 192.168.99.1.9103 > 192.168.97.17.24379: Flags [.], ack 1271912113, win 1518, options [nop,nop,TS val 3960840091 ecr 2751965293,nop,nop,sack 1 {1271915009:1271919353}], length 0
18:44:58.090972 IP 192.168.97.17.24379 > 192.168.99.1.9103: Flags [.], seq 1271912113:1271913561, ack 0, win 1027, options [nop,nop,TS val 2751965353 ecr 3960840091], length 1448

Here we sent.

18:44:58.090981 IP 192.168.97.17.24379 > 192.168.99.1.9103: Flags [.], seq 1271913561:1271915009, ack 0, win 1027, options [nop,nop,TS val 2751965353 ecr 3960840091], length 1448
18:44:58.098474 IP 192.168.99.1.9103 > 192.168.97.17.24379: Flags [.], ack 1271913561, win 1518, options [nop,nop,TS val 3960840098 ecr 2751965293,nop,nop,sack 1 {1271915009:1271919353}], length 0

And here we got ACK. This is actually 7.5 ms, well, near enough to 6.

18:44:58.098560 IP 192.168.97.17.24379 > 192.168.99.1.9103: Flags [.], seq 1271919353:1271920801, ack 0, win 1027, options [nop,nop,TS val 2751965358 ecr 3960840098], length 1448
18:44:58.107088 IP 192.168.99.1.9103 > 192.168.97.17.24379: Flags [.], ack 1271919353, win 1450, options [nop,nop,TS val 3960840107 ecr 2751965293], length 0
18:44:58.107167 IP 192.168.97.17.24379 > 192.168.99.1.9103: Flags [.], seq 1271920801:1271922249, ack 0, win 1027, options [nop,nop,TS val 2751965368 ecr 3960840107], length 1448
18:44:58.137900 IP 192.168.99.1.9103 > 192.168.97.17.24379: Flags [.], ack 1271919353, win 1540, options [nop,nop,TS val 3960840138 ecr 2751965293,nop,nop,sack 1 {1271912113:1271913561}], length 0

And here is the DSACK.

18:44:58.147276 IP 192.168.99.1.9103 > 192.168.97.17.24379: Flags [.], ack 1271919353, win 1540, options [nop,nop,TS val 3960840147 ecr 2751965293,nop,nop,sack 1 {1271913561:1271915009}], length 0
18:44:58.235268 IP 192.168.99.1.9103 > 192.168.97.17.24379: Flags [.], ack 1271922249, win 1518, options [nop,nop,TS val 3960840225 ecr 2751965358], length 0
18:44:58.235380 IP 192.168.97.17.24379 > 192.168.99.1.9103: Flags [.], seq 1271922249:1271923697, ack 0, win 1027, options [nop,nop,TS val 2751965497 ecr 3960840225], length 1448
18:44:58.235389 IP 192.168.97.17.24379 > 192.168.99.1.9103: Flags [.], seq 1271923697:1271925145, ack 0, win 1027, options [nop,nop,TS val 2751965497 ecr 3960840225], length 1448
18:44:58.235392 IP 192.168.97.17.24379 > 192.168.99.1.9103: Flags [.], seq 1271925145:1271926593, ack 0, win 1027, options [nop,nop,TS val 2751965497 ecr 3960840225], length 1448
18:44:58.305514 IP 192.168.99.1.9103 > 192.168.97.17.24379: Flags [.], ack 1271925145, win 1518, options [nop,nop,TS val 3960840293 ecr 2751965497], length 0

So, what do we do now? This doesn't look very easy to fix. Any ideas,
anybody?

cheerio,
PMc