[Bug 275798] panic: sackhint bytes rtx >= 0
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Tue, 19 Dec 2023 14:51:25 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=275798 Richard Scheffenegger <rscheff@freebsd.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Resolution|--- |FIXED Status|New |Closed --- Comment #11 from Richard Scheffenegger <rscheff@freebsd.org> --- e3b9058e5cd0f541da596624a366e14cabcf2e2a / D43085 was found early on during a code review after this issue. Completed the inspection of the bblog traces - and confirmed that the above fix actually addressed the root cause. The issue is indeed connected to an error during ip_output (55 - ENOBUFS), which had to be dealt with appropriately by the TCP stack. The in-depth explanation: When there is no response from the remote side for some time, TCP will retransmit some packets as classical retransmits after a (lengthy) timeout - RTO - outside of SACK. When an ACK with SACK information now arrives, the SACK scoreboard is re-populated (the RTO had previously cleared it all). At a subsequent transmission opportunity, the SACK hole is transmitted - if this transmission fails, the unrolling of the transmission could end up not in the expected state, if the entire hole was to be transmitted. This effect got introduced by the Lost Retransmission Detection (LRD) feature, where the hole->rxmit pointer tracks when a retransmission ought to have arrived at the receiver. Once a hole is fully retransmitted, it set to snd_recover. So, in order to elicit this: LRD has to be enabled (now enabled by default in CURRENT) SACK loss recovery must have started RTO occurs Overlapping packets get retransmitted by RTO without SACK SACK information is recovered SACK retransmits same data as RTO - and the entrie hole, but fails due to an internal error If the non-SACK transmission eventually makes it, because of the improper handling of the SACK hole retransmit pointer, we could end up with incorrect accounting - running in to the panic. In the bblog, we can see the ip_output error for the SACK retransmission and preceeding this, a RTO retransmission. Furthermore, the hole->rxmit pointer is exactly the size of the hole beneath snd_recover. -- You are receiving this mail because: You are on the CC list for the bug.