[Bug 263062] tcp_inpcb leaking in VM environment

From: <bugzilla-noreply_at_freebsd.org>
Date: Mon, 07 Aug 2023 22:09:06 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=263062

--- Comment #6 from Max Khon <fjoe@FreeBSD.org> ---
I can confirm that switching Hetzner VM to i440fx (rescale to Intel plan, then
ask Hetzner support to switch to i440fx as some Intel VMs are also provisioned
with Q35 chipset) solves the issue (on the same 13.2-RELEASE kernel):

--- cut here ---
ITEM                   SIZE  LIMIT     USED     FREE      REQ     FAILSLEEP
XDOMAIN
udp_inpcb:              496, 510927,      12,    1516,   70257,   0,   0,   0
tcp_inpcb:              496, 510927,     649,    1383,  111267,   0,   0,   0
udplite_inpcb:          496, 510927,       0,       0,       0,   0,   0,   0
--- cut here ---

The difference between i440fx and Q35 is that the latter provides "modern"
virtio devices, while i440fx provides "legacy" virtio devices.

I suspect the problem is somewhere in "modern" virtqueue or modern vtnet
implementation (which has been added in FreeBSD 13). FreeBSD 12 does not even
boot on Q35 chipset because of missing "modern" support.

I would suggest to not do any MFC of "modern" virtio until this issue is fixed.

On a side note: I have reproduced this issue with Q35 chipset ("modern" virtio)
on a plain 13.2-RELEASE in a Hetzner Q35 VM (any AMD plan) with just nginx
serving static content (default nginx page) and running "ab -c 100 -n
1000000000 http://x.y.z.w/" in a loop:

--- cut here ---
ITEM                   SIZE  LIMIT     USED     FREE      REQ     FAILSLEEP
XDOMAIN
udp_inpcb:              496, 126863,   12187,     261,   12475,   0,   0,   0
tcp_inpcb:              496, 126863,   29245,     219, 1204697,   0,   0,   0
udplite_inpcb:          496, 126863,       0,       0,       0,   0,   0,   0
--- cut here ---

Also, I noticed that nginx process becomes unkillable (even with SIGKILL) and
"ps axl | grep nginx" output is as follows:

--- cut here ---
   0  848    1 1  20  0 20024  7624 pause    Is    -     0:00.00 nginx: master
process /usr/local/sbin/nginx
  80  898  848 0  33  0 20024  8480 -        R     -     1:44.88 nginx: worker
process (nginx)
--- cut here ---

Notice that nginx worker process does not have MWCHAN. Also, trying to do
ktrace/struss or attaching gdb to nginx process just hangs.

Additionally, adding a simple Django application (just default empty Django
application, run as "manage.py runserver") behind nginx increases a probability
of inpcb leak (USER counters grows faster). I use simple reverse proxying like
this:

--- cut here ---
        location / {
            proxy_pass http://localhost:8000;
        }
--- cut here ---

-- 
You are receiving this mail because:
You are the assignee for the bug.