Re: What's going on with vnets and epairs w/ addresses?
Date: Sun, 18 Dec 2022 06:10:03 UTC
> On Dec 18, 2022, at 3:23 AM, Bjoern A. Zeeb <bz@freebsd.org> wrote: > > On Sat, 17 Dec 2022, Gleb Smirnoff wrote: > >> Zhenlei, >> >> On Fri, Dec 16, 2022 at 06:30:57PM +0800, Zhenlei Huang wrote: >> Z> I managed to repeat this issue on CURRENT/14 with this small snip: >> Z> >> Z> ------------------------------------------- >> Z> #!/bin/sh >> Z> >> Z> # test jail name >> Z> n="test_ref_leak" >> Z> >> Z> jail -c name=$n path=/ vnet persist >> Z> # The following line trigger jail pr_ref leak >> Z> jexec $n ifconfig lo0 inet 127.0.0.1/8 >> Z> >> Z> jail -R $n >> Z> >> Z> # wait a moment >> Z> sleep 1 >> Z> >> Z> jls -j $n >> Z> >> Z> After DDB debugging and tracing , it seems that is triggered by a combine of [1] and [2] >> Z> >> Z> [1] https://reviews.freebsd.org/rGfec8a8c7cbe4384c7e61d376f3aa5be5ac895915 <https://reviews.freebsd.org/rGfec8a8c7cbe4384c7e61d376f3aa5be5ac895915> >> Z> [2] https://reviews.freebsd.org/rGeb93b99d698674e3b1cc7139fda98e2b175b8c5b <https://reviews.freebsd.org/rGeb93b99d698674e3b1cc7139fda98e2b175b8c5b> I can confirm [2] also affects Non-VNET jails. Prison pr_ref leak cause jail stuck in dying state. >> Z> >> Z> >> Z> In [1] the per-VNET uma zone is shared with the global one. >> Z> `pcbinfo->ipi_zone = pcbstor->ips_zone;` >> Z> >> Z> In [2] unref `inp->inp_cred` is deferred called in inpcb_dtor() by uma_zfree_smr() . >> Z> >> Z> Unfortunately inps freed by uma_zfree_smr() are cached and inpcb_dtor() is not called immediately , >> Z> thus leaking `inp->inp_cred` ref and hence `prison->pr_ref`. >> Z> >> Z> And it is also not possible to free up the cache by per-VNET SYSUNINIT tcp_destroy / udp_destroy / rip_destroy. >> >> This is known issue and I'd prefer not to call it a problem. The "leak" of a jail >> happens only if machine is idle wrt the networking activity. >> >> Getting back to the problem that started this thread - the epair(4)s not immediately >> popping back to prison0. IMHO, the problem again lies in the design of if_vmove and >> epair(4) in particular. The if_vmove shall not exist, instead we should do a full >> if_attach() and if_detach(). The state of an ifnet when it undergoes if_vmove doesn't >> carry any useful information. With Alexander melifaro@ we discussed better options >> for creating or attaching interfaces to jails that if_vmove. Until they are ready >> the most easy workaround to deal with annoying epair(4) come back problem is to >> remove it manually before destroying a jail, like I did in 80fc25025ff. > > Ok, move an em0 or cxl0 into the jail; the problem will be the same I > bet and you need the physical interface to not disappear as then you > cannot re-create a new jail with it. Re-read sys/kern/kern_jail.c, if pr_ref leaks, vnet_destroy() has no chance to be called, thus if_vmove is not called and epair(4)s or em0, exl0 are not returned to home vnet. That can be confirmed by setting debug point on vnet_destroy by DDB, and then create and destroy vnet jails. So before the problem prison pr_ref count leaks is resolved, it will cover other potential problems such as @glebius pointed out. I think the problem that prison ref count leaks should be resolved first. I'm also reviewing the life cycles of prison / vnet and it seems they could still be improved. > > /bz > > -- > Bjoern A. Zeeb r15:7