Long-haul problems - connections stuck in slow start
Peter Wemm
peter at wemm.org
Sat Jan 4 18:38:57 UTC 2014
We're seeing some unfortunate misbehavior with tcp over an intercontinental
link.
eg: fetching a 30GB http file from various package mirrors by a remote:
us-west(ISC) -> london(BME)
bd93e71c-cae4-44fd-943c-d1a88dbf6c6d.tar 0% of 29 GB 961 kBps 09h03m^C
us-east(NYI) -> london(BME)
bd93e71c-cae4-44fd-943c-d1a88dbf6c6d.tar 0% of 29 GB 1070 kBps 08h08m^C
us-west(YSV) -> london(BME)
bd93e71c-cae4-44fd-943c-d1a88dbf6c6d.tar 0% of 29 GB 14 kBps 590h22m^C
Spot the one we're concerned about...
Ping times for the three (in order):
round-trip min/avg/max/std-dev = 144.330/144.532/144.797/0.157 ms
round-trip min/avg/max/std-dev = 79.650/79.965/80.488/0.287 ms
round-trip min/avg/max/std-dev = 148.588/153.292/155.688/2.903 ms
The problem pair is worth showing some detail on:
16 bytes from ..:206a::1001:10, icmp_seq=4 hlim=55 time=148.588 ms
16 bytes from ..:206a::1001:10, icmp_seq=5 hlim=55 time=155.140 ms
16 bytes from ..:206a::1001:10, icmp_seq=6 hlim=55 time=149.443 ms
16 bytes from ..:206a::1001:10, icmp_seq=7 hlim=55 time=155.688 ms
16 bytes from ..:206a::1001:10, icmp_seq=8 hlim=55 time=148.630 ms
16 bytes from ..:206a::1001:10, icmp_seq=9 hlim=55 time=155.486 ms
It appears that there are two packet paths between the endpoints that have
either ~148ms or ~155ms. I've done some longer samples and they're fairly
consistent clusters.
All four machines talk to each other.
Here's where it gets interesting. On the sender at us-west(YSV), I see this:
net.inet.tcp.hostcache.list:
IP address SSTRESH RTT RTTVAR CWND HITS
us-west(ISC) 59521 5ms 1ms 16845 15055031
eu-west(BME) 7343 150ms 2ms 13501 3433775
us-east(NYI) 530489 100ms 37ms 16681 43043786
The ssthresh is very low for the problematic ysv<->bme pair.
When I do a tcpdump, I see the sender fire off 7343 bytes of data, then stop
and wait for acks. It's completely ignoring the receiver's window state.
It appears stuck in slowstart mode.
Some other data:
Proto Recv-Q Send-Q Local Address Foreign Address (state)
tcp6 0 1047852 2001:19:2.443 2001:41c8:.24490 ESTABLISHED
(netstat -x, sorry about the wrap)
Proto Recv-Q Send-Q Local Address Foreign Address R-MBUF
S-MBUF R-CLUS S-CLUS R-HIWA S-HIWA R-LOWA S-LOWA R-BCNT S-BCNT R-BMAX S-BMAX
rexmt persist keep 2msl delack rcvtime
tcp6 0 1048152 2001:1900:2254:2.443 2001:41c8:112:83.24490 0
374 0 373 65688 1049580 1 2048 0 1420800 525504
8396640 0.43 0.00 7199.93 0.00 0.00 0.06
The "interesting" parts of -x:
rexmt persist keep 2msl delack rcvtime
0.43 0.00 7199.93 0.00 0.00 0.06
-T
Proto Rexmit OOORcv 0-win Local Address Foreign Address
tcp6 54161 0 0 2001:192.443 2001:41:83.24490
note retransmits(!)
Some tcpcb fields that caught my eye for the connection:
snd_wnd = 1048576,
snd_cwnd = 5712,
t_srtt = 6391,
t_rttvar = 903,
t_rxtshift = 0,
t_rttmin = 30,
t_rttbest = 4903,
t_rttupdated = 220095,
max_sndwnd = 1048576,
snd_cwnd_prev = 4284,
snd_ssthresh_prev = 2856,
snd_recover_prev = 1397053524,
t_sndzerowin = 0,
t_badrxtwin = 584273259,
snd_limited = 0 '\0',
t_rttlow = 150,
I've stored some dumps of the tcpcb at
http://people.freebsd.org/~peter/tcpcb.txt
Note that some in the tcpcb.txt file also have
snd_limited = 2 '\002',
Over the last few days I've tried things like turning off sack, tso, the
various rfc knobs etc. I believe they're all back to normal now.
There's small ~15 second tcpdump sample of the sender side and the receiver
side at: http://people.freebsd.org/~peter/send.cap.gz and
http://people.freebsd.org/~peter/recv.cap.gz
Both ends were ntp synced. The dumps have no sensitive data.
For amusement, I just tried this, with roughly 1 second in between:
peter at bme:~ % scp pkg-ysv:k.gz /tmp
k.gz 100% 25MB 5.0MB/s 00:05
peter at bme:~ % scp pkg-ysv:k.gz /tmp
k.gz 0% 960KB 20.3KB/s 41:29 ETA^C
There was no pre-existing hostcache state between those two endpoints for
the first run. At the end, this was created in the hostcache:
IP address SSTRESH RTT RTTVAR BANDWIDTH CWND
213.138.. 5952 165ms 21ms 0 8688
All connections went slow after that. Note that the ssh test was over ipv4
- the rest above is on ipv6. However, we're seeing the same weird stuff
with http over ipv4 as well between the same two endpoints.
It was pointed out to me that this has come up before, eg: misc/173859
I know we've seen this at work as well.
A few days earlier we were pushing ~45MB/sec (bytes, not bits) between these
endpoints. Out of the blue it crashed to ~10KB/sec. Why can't it get out of
slow-start? Is it even stuck in slow-start like I think? Is the 148-155ms
bimodal rtt the problem?
Any insight would be greatly appreciated. (please don't drop me from cc:)
--
Peter Wemm - peter at wemm.org; peter at FreeBSD.org; peter at yahoo-inc.com; KI6FJV
UTF-8: for when a ' just won\342\200\231t do.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 246 bytes
Desc: OpenPGP digital signature
URL: <http://lists.freebsd.org/pipermail/freebsd-net/attachments/20140104/cbf521a2/attachment.sig>
More information about the freebsd-net
mailing list