Re: Interaction between the re-transmit and keep-alive logic.

From: Michael Tuexen <michael.tuexen_at_lurchi.franken.de>
Date: Wed, 11 Dec 2024 18:40:34 UTC
> On 10. Dec 2024, at 10:17, Pavel Vazharov <pavel@x3me.net> wrote:
> 
>> 
>>> On 9. Dec 2024, at 10:50, Pavel Vazharov <pavel@x3me.net> wrote:
>>> 
>>> Hi there,
>>> 
>>> We are using the network stack of FreeBSD 13 on top of DPDK in our application.
>>> During the last tests in the lab I stumbled upon the following situation:
>>> 1. It's a test where 5000 parallel connections are opened by Apache
>>> Bench and each one downloads 1MB data. It causes the client NIC to
>>> start dropping packets due to overflows which is intentional behavior.
>>> 2. The server side is our application with the FreeBSD stack. The
>>> client side Ubuntu 24.04 with Linux 6.8.0.
>>> 3. So, a connection is opened and the download starts on it. At some
>>> point the first drops occur and according to the TCP dump, from the
>>> client side, they take a few seconds before the connection heals up.
>>> However, these drops lead to increased values of t_srtt, t_rttvar and
>>> thus to increased value of t_rxtcur.
>> Do you observe increased values of t_rxtcur due to exponential backoff
>> or due to extreme values of t_srtt and t_rttvar?
> 
> I think it's a cumulative effect from both retransmits that happen and
> the finally
> received ACK packet. For example, from the client side pcap it's can
> be seen that
> in the period 17-20 second (105-107th packet) there are lost packets
> and the client stack
> resends the ACK packet with the same timestamp of the echo reply: 3145161949.
> On the server side there are 4 retransmits from this time (packets
> 546-549) and then the
> ACK packet from the client is received - packet 550.
> In the FreeBSD stack the retransmits trigger the back-off logic and
> this line in `tcp_timer_rexmt`
>    TCPT_RANGESET(tp->t_rxtcur, rexmt,
>              tp->t_rttmin, TCPTV_REXMTMAX);
> sets values of: 32, 64, 128, 256.
> Then the ACK packet is received and `tcp_xmit_timer` sets the following values:
> rtt:326 t_rxtcur:998 TCP_REXMTVAL(tp):978 t_rttmin:3 t_srtt:10432 t_rttvar:2608
> due to the echo reply timestamp.
> The next received ACK packets with rtt:3 lead to values of t_rxtcur:
> 1118 - 1163 - 1157.
> And then the final retransmitted packet (packet 736 from the server
> pcap) leads to t_rxtcur: 2274.
> The FreeBSD stack is setup with 100Hz clock (per my understanding
> t_rxtcur is in ticks i.e. 2274
> ticks are roughly equal to 22 seconds, if I'm not mistaken).
> The keep-alives happened at 35 and 50-th seconds and then the client
> gave up at 90-th second.
> 
>>> 
>>> 4. The window opens again up to 100-200 KB with lots of packets
>>> in-flight and the drops start again. They cause the re-transmit timer
>>> from the FreeBSD side to be started but with an interval of something
>>> like 18-20 seconds (according to my printf debugging on this side).
>>> 5. At the same time the TCP keep-alive timer is also started for the
>>> same connection (it's enabled for all connections) with a timeout of
>>> 15 seconds.
>>> 6. Nothing happens on this connection for the next 15 seconds. I'm not
>>> sure why the Linux stack didn't send any "wake-up" ACK packets or
>>> something but the tcpdump from the client side shows full silence
>>> between 14-th and 29-th second.
>>> 7. Next the FreeBSD keep-alive logic kicks-in and sends an ACK packet
>>> which is ACK-ed by the Linux stack immediately. However, this ACK
>>> packet received by the FreeBSD stack leads to restart of the
>>> retransmit timer and with the interval which is bigger than the
>>> keep-alive interval.
>>> 8. Point 6 and 7 repeat one more time before the apache bench client
>>> gives up on this connection and declares that it's timed-out. My
>>> understanding is that the connection can "loop" in 6-7 for a very long
>>> time and a packet with data will never be retransmitted.
>> Can you provide a .pcap file?
> 
> I did a new test today to have pcap files from both sides and my
> explanations above are
> related to this new test.
> I'm attaching the .pcap files from the server as well as from the client side.
> Note that the capture size for each packet was limited to 80 bytes.
> If by some reason, the attached files are dropped from this email the
> same pcap files
> can be downloaded from the following link:
> https://drive.google.com/drive/folders/1418Qdc3E3ptjo2VcPXA6jo-4b5n-fYOb?usp=sharing
> 
>> 
>> Best regards
>> Michael
> 
> Thank you for the help.
Hi Pavel,

I will look into this, but it may take until next week.

Best regards
Michael
> 
>>> 9. As far as I debugged the situation from the FreeBSD side the
>>> restart of the retransmit timer happens in the code after the
>>> `process_ACK` label, in the else branch here:
>>> ```
>>>       if (th->th_ack == tp->snd_max) {
>>>           tcp_timer_activate(tp, TT_REXMT, 0);
>>>           needoutput = 1;
>>>       } else if (!tcp_timer_active(tp, TT_PERSIST))
>>>           tcp_timer_activate(tp, TT_REXMT, tp->t_rxtcur);
>>> ```
>>> 
>>> So, based on the above situation I've the following questions:
>>> 1. Would it be correct if the re-transmit timer is not restarted by
>>> keep-alive ACK packets?
>>> 2. Assuming that the above change won't break anything else, is there
>>> a way for detecting that an ACK packet acknowledges previously sent
>>> keep-alive packet?
>>> 
>>> Regards,
>>> Pavel.
>>> 
>> 
> <client43-test.pcap><server43-test.pcap>