Re: HEADS UP: NFS changes coming into CURRENT early February

From: Gleb Smirnoff <glebius_at_freebsd.org>
Date: Wed, 29 Jan 2025 05:13:57 UTC
On Mon, Jan 27, 2025 at 06:10:42PM -0800, Rick Macklem wrote:
R> I think I've found a memory leak, but it shouldn't be a show stopper.
R> 
R> What I did on the NFS client side is:
R> # vmstat -m | fgrep -i rpc
R> # mount -t nfs -o nfsv4,tls nfsv4-server:/ /mnt
R> # ls --lR /mnt
R> --> Then I network partitioned it from the server a few times, until
R>       the TCP connection closed.
R>       (My client is in bhyve and the server on the system the bhyve
R>        instance is running in. I just "ifconfig bridge0 down", waited for
R>        the TCP connection to close "netstat --a" then "ifconfig bridge0 up".
R> Once done, I
R> # umount /mnt
R> # vmstat -m | fgrep -i rpc
R> and say a somewhat larger allocation count
R> 
R> The allocation count only goes up if I do the network partitioning
R> and only on the NFS client side.
R> 
R> Since the leak is slow and only happens when the TCP connection
R> breaks, I do not think it is a show stopper and one of us can track it down
R> someday.

I reproduced the recipe and find two problems, but don't have a final solution
for either.

First, when we create backchannel in sys/fs/nfs/nfs_commonkrpc.c
newnfs_connect():

		if (nfs_numnfscbd > 0) {
			nfs_numnfscbd++;
			NFSD_UNLOCK();
			xprt = svc_vc_create_backchannel(
			    nfscbd_pool);
			CLNT_CONTROL(client, CLSET_BACKCHANNEL,
			    xprt);
			NFSD_LOCK();
			nfs_numnfscbd--;
			if (nfs_numnfscbd == 0)
				wakeup(&nfs_numnfscbd);
		}


The svc_vc_create_backchannel() creates xprt with refcount=1. Then we link it
to our client, CLNT_CONTROL() makes refcount=2. Then this functions forgets the
pointer (but it owns refcount). Whenever the client is destroyed, it will do
SVC_RELEASE() on the backchannel, refcount goes 2 to 1 and it is leaked.

I made a patch against that and it does reduce amount of M_RPC leaked after
your recipe, but once I got panic where the backchannel xprt is actually used
after free.  So looks like there is a case that my patch doesn't cover.  What
drives me crazy, can't reproduce it for the second time and can't get any clue
from the single core. Patch attached.

Second, with the patch the M_RPC leak count for me is 2. And I found that these
two items are basically is a clnt_vc that belongs to a closed connection:

fffff80029614a80 tcp4       0      0 10.6.6.9.772       10.6.6.9.2049      CLOSED     

There is no connection peer connection, as the server received a timeout trying
to send. But rpc.tlsclntd doesn't try to send anything on the socket, it just
keeps it select(2) fd set and doesn't garbage collect.

So it is a bigger resource leak than just two pieces of M_RPC. I don't think
this is related to my changes.

-- 
Gleb Smirnoff