[Bug 275798] panic: sackhint bytes rtx >= 0

From: <bugzilla-noreply_at_freebsd.org>
Date: Tue, 19 Dec 2023 14:51:25 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=275798

Richard Scheffenegger <rscheff@freebsd.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
             Status|New                         |Closed

--- Comment #11 from Richard Scheffenegger <rscheff@freebsd.org> ---

e3b9058e5cd0f541da596624a366e14cabcf2e2a / D43085 was found early on during a
code review after this issue.


Completed the inspection of the bblog traces - and confirmed that the above fix
actually addressed the root cause.

The issue is indeed connected to an error during ip_output (55 - ENOBUFS),
which had to be dealt with appropriately by the TCP stack.


The in-depth explanation:

When there is no response from the remote side for some time, TCP will
retransmit some packets as classical retransmits after a (lengthy) timeout -
RTO -  outside of SACK.

When an ACK with SACK information now arrives, the SACK scoreboard is
re-populated (the RTO had previously cleared it all). 

At a subsequent transmission opportunity, the SACK hole is
transmitted - if this transmission fails, the unrolling of the transmission
could end up not in the expected state, if the entire hole was to be
transmitted.

This effect got introduced by the Lost Retransmission Detection (LRD) feature,
where the hole->rxmit pointer tracks when a retransmission ought to have
arrived at the receiver. Once a hole is fully retransmitted, it set to
snd_recover.


So, in order to elicit this:

LRD has to be enabled (now enabled by default in CURRENT)
SACK loss recovery must have started
RTO occurs
Overlapping packets get retransmitted by RTO without SACK
SACK information is recovered
SACK retransmits same data as RTO - and the entrie hole, but fails due to an
internal error



If the non-SACK transmission eventually makes it, because of the improper
handling of the SACK hole retransmit pointer, we could end up with incorrect
accounting - running in to the panic.


In the bblog, we can see the ip_output error for the SACK retransmission
and preceeding this, a RTO retransmission.
Furthermore, the hole->rxmit pointer is exactly the size of the hole beneath
snd_recover.

-- 
You are receiving this mail because:
You are on the CC list for the bug.