Interaction between the re-transmit and keep-alive logic.

From: Pavel Vazharov <pavel_at_x3me.net>
Date: Mon, 09 Dec 2024 09:50:02 UTC
Hi there,

We are using the network stack of FreeBSD 13 on top of DPDK in our application.
During the last tests in the lab I stumbled upon the following situation:
1. It's a test where 5000 parallel connections are opened by Apache
Bench and each one downloads 1MB data. It causes the client NIC to
start dropping packets due to overflows which is intentional behavior.
2. The server side is our application with the FreeBSD stack. The
client side Ubuntu 24.04 with Linux 6.8.0.
3. So, a connection is opened and the download starts on it. At some
point the first drops occur and according to the TCP dump, from the
client side, they take a few seconds before the connection heals up.
However, these drops lead to increased values of t_srtt, t_rttvar and
thus to increased value of t_rxtcur.
4. The window opens again up to 100-200 KB with lots of packets
in-flight and the drops start again. They cause the re-transmit timer
from the FreeBSD side to be started but with an interval of something
like 18-20 seconds (according to my printf debugging on this side).
5. At the same time the TCP keep-alive timer is also started for the
same connection (it's enabled for all connections) with a timeout of
15 seconds.
6. Nothing happens on this connection for the next 15 seconds. I'm not
sure why the Linux stack didn't send any "wake-up" ACK packets or
something but the tcpdump from the client side shows full silence
between 14-th and 29-th second.
7. Next the FreeBSD keep-alive logic kicks-in and sends an ACK packet
which is ACK-ed by the Linux stack immediately. However, this ACK
packet received by the FreeBSD stack leads to restart of the
retransmit timer and with the interval which is bigger than the
keep-alive interval.
8. Point 6 and 7 repeat one more time before the apache bench client
gives up on this connection and declares that it's timed-out. My
understanding is that the connection can "loop" in 6-7 for a very long
time and a packet with data will never be retransmitted.
9. As far as I debugged the situation from the FreeBSD side the
restart of the retransmit timer happens in the code after the
`process_ACK` label, in the else branch here:
```
        if (th->th_ack == tp->snd_max) {
            tcp_timer_activate(tp, TT_REXMT, 0);
            needoutput = 1;
        } else if (!tcp_timer_active(tp, TT_PERSIST))
            tcp_timer_activate(tp, TT_REXMT, tp->t_rxtcur);
```

So, based on the above situation I've the following questions:
1. Would it be correct if the re-transmit timer is not restarted by
keep-alive ACK packets?
2. Assuming that the above change won't break anything else, is there
a way for detecting that an ACK packet acknowledges previously sent
keep-alive packet?

Regards,
Pavel.