Interaction between the re-transmit and keep-alive logic.
Date: Mon, 09 Dec 2024 09:50:02 UTC
Hi there, We are using the network stack of FreeBSD 13 on top of DPDK in our application. During the last tests in the lab I stumbled upon the following situation: 1. It's a test where 5000 parallel connections are opened by Apache Bench and each one downloads 1MB data. It causes the client NIC to start dropping packets due to overflows which is intentional behavior. 2. The server side is our application with the FreeBSD stack. The client side Ubuntu 24.04 with Linux 6.8.0. 3. So, a connection is opened and the download starts on it. At some point the first drops occur and according to the TCP dump, from the client side, they take a few seconds before the connection heals up. However, these drops lead to increased values of t_srtt, t_rttvar and thus to increased value of t_rxtcur. 4. The window opens again up to 100-200 KB with lots of packets in-flight and the drops start again. They cause the re-transmit timer from the FreeBSD side to be started but with an interval of something like 18-20 seconds (according to my printf debugging on this side). 5. At the same time the TCP keep-alive timer is also started for the same connection (it's enabled for all connections) with a timeout of 15 seconds. 6. Nothing happens on this connection for the next 15 seconds. I'm not sure why the Linux stack didn't send any "wake-up" ACK packets or something but the tcpdump from the client side shows full silence between 14-th and 29-th second. 7. Next the FreeBSD keep-alive logic kicks-in and sends an ACK packet which is ACK-ed by the Linux stack immediately. However, this ACK packet received by the FreeBSD stack leads to restart of the retransmit timer and with the interval which is bigger than the keep-alive interval. 8. Point 6 and 7 repeat one more time before the apache bench client gives up on this connection and declares that it's timed-out. My understanding is that the connection can "loop" in 6-7 for a very long time and a packet with data will never be retransmitted. 9. As far as I debugged the situation from the FreeBSD side the restart of the retransmit timer happens in the code after the `process_ACK` label, in the else branch here: ``` if (th->th_ack == tp->snd_max) { tcp_timer_activate(tp, TT_REXMT, 0); needoutput = 1; } else if (!tcp_timer_active(tp, TT_PERSIST)) tcp_timer_activate(tp, TT_REXMT, tp->t_rxtcur); ``` So, based on the above situation I've the following questions: 1. Would it be correct if the re-transmit timer is not restarted by keep-alive ACK packets? 2. Assuming that the above change won't break anything else, is there a way for detecting that an ACK packet acknowledges previously sent keep-alive packet? Regards, Pavel.