[Bug 275169] Panic: rw_rlock: wlock already held for tcpinp @ /usr/src/sys/netinet/in_pcb.c:2529

From: <bugzilla-noreply_at_freebsd.org>
Date: Sat, 18 Nov 2023 15:00:14 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=275169

            Bug ID: 275169
           Summary: Panic: rw_rlock: wlock already held for tcpinp @
                    /usr/src/sys/netinet/in_pcb.c:2529
           Product: Base System
           Version: 13.2-RELEASE
          Hardware: amd64
                OS: Any
            Status: New
          Severity: Affects Some People
          Priority: ---
         Component: kern
          Assignee: bugs@FreeBSD.org
          Reporter: i.dani@outlook.com

Created attachment 246389
  --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=246389&action=edit
Kernel config

After we upgraded some of our hosts from FreeBSD 12.4 to 13.2 we started seeing
some of our physical hosts freezing randomly and without panicing.

We were able to narrow it down to some IPFW rules. The setup is the following:
- Host A: Recently upgraded physical host with FreeBSD 13.2
- Host B: Also a physical host with FreeBSD 13.2 which runs a webserver,
serving our own pkg-repos (10.1.1.20). Webserver doesn't matter - we tried
Apache, nginx, thttpd. Filetype also doesn't matter (.pkg, .txt, whatever :)).
We also tried with multiple hosts.

Host A has following IPFW rule:
ipfw add 1000 allow ip from me to 10.1.1.20/32 uid 0

Host B has the following IPFW rule:
ipfw add 2000 allow tcp from any to 10.1.1.20 80,443 keep-state

We can reproduce the freeze by repeatedly fetching a file on Host A from Host B
(we initially triggered the bug when running "pkg upgrade"):
[root@host-a] $ while true; do curl -v http://10.1.1.20/test.txt --output
/dev/null; done

Sometimes immediately, sometimes after a few seconds the network connection of
Host A is lost. Sometimes we are still able to log in through a local shell.
Sometimes after a few seconds, sometimes after 1-2 minutes the host freezes
completely. There is no kernel panic and nothing in the logs. Host B is still
running fine and doesn't freeze.

Whats interesting:
- Freezes do NOT happen if the "uid 0" selector from Host A's rule is removed.
- Freezes do NOT happen if the "keep-state" of Host B's rule is removed.
- Freezes do NOT happen with our virtual servers - only physical hosts are
affected.

After building and installing the kernel with debug options, we were finally
able to cause a panic and get some more informations. The Kernel has been built
with the following options:
makeoptions    DEBUG=-g
options                INVARIANTS
options                INVARIANT_SUPPORT
options                WITNESS
options                WITNESS_SKIPSPIN
options                DIAGNOSTIC

You can find the full kernel config attached (config.txt).

PANIC: rw_rlock: wlock already held for tcpinp @
/usr/src/sys/netinet/in_pcb.c:2529

Attached you find:
- HW-Info.txt: Hardware information of one of the hosts that freezes. Other
hosts that freeze (and also Host B of the example abvoe) are physical too and
also use the same NIC-Driver (ix).
- info.txt: File written @ Panic - Contains FreeBSD version info and so on.
- config.txt: Kernel config (see above).
- ddb.txt: Contains the ddb-Dump - reduced to the relevant stuff (panic,
backtrace, locks). The full ddb-Dump (containing all procs) can be provided if
needed.

Any help in further debuging or fixing this would be highly appreciated!

-- 
You are receiving this mail because:
You are the assignee for the bug.