[Bug 217637] One TCP connection accepted TWO times
bugzilla-noreply at freebsd.org
bugzilla-noreply at freebsd.org
Mon Jul 24 22:09:06 UTC 2017
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=217637
Richard Russo <freebsd at ruka.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |freebsd at ruka.org
--- Comment #87 from Richard Russo <freebsd at ruka.org> ---
Created attachment 184680
--> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=184680&action=edit
patch to not send acks in this case
We've recently started hitting this at WhatsApp as well. I've applied the
syncookie patches from CURRENT to 10.3 manually, and it successfully prevents
this from happening as long as the syncache hasn't overflowed recently or been
disabled.
Unfortunately, if the syncache does overflow, when this case does happen, once
the connection is re-opened, the connection states on each peer are out of
sync, and each peer will respond to a packet with unreasonable seq/ack data by
sending an empty ack with the current seq/ack; the other peer will find this
unreasonable and the resulting packet storms were causing availability
problems.
I've attached a patch we've been running on a few machines. With this, when the
connections do get into this state, we don't contribute to the packet storm;
instead, the connection will end up eventually closing without sending very
many packets.
I have some complete connection pcaps available (from before patching), and can
share them (after masking IPs and tcp payloads) if they'll be useful. From the
traces I've seen, we're getting many retransmits from the peer (or a
middlebox), and also the peer ends the connection soon after opening, by
sending a FIN. Our host acks the FIN and also closes with a FIN. After the
peer's ack of our FIN, we receive a new ACK that's a retransmit of the original
ACK, and reopen the connection at the connection original SEQ/ACK, while the
peer is in TIME_WAIT at the final SEQ/ACK. In the traces I was able to capture,
the peers were mobile devices across the world and on high latency links.
--
You are receiving this mail because:
You are on the CC list for the bug.
More information about the freebsd-net
mailing list