From nobody Sun Dec 18 16:20:45 2022 X-Original-To: freebsd-jail@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4NZp3l0TMHz1GLwx for ; Sun, 18 Dec 2022 16:20:51 +0000 (UTC) (envelope-from bz@freebsd.org) Received: from smtp.freebsd.org (smtp.freebsd.org [96.47.72.83]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "smtp.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4NZp3l000Lz46rV; Sun, 18 Dec 2022 16:20:50 +0000 (UTC) (envelope-from bz@freebsd.org) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1671380451; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=1bXBignYNKw8DAFiXeUPG8NPWkfvGbBOpDeHmcvv8c8=; b=LJhe5sjKdo8foU//L5us/2R7j3BNIcbcb2nDj9rUH6yinBc9NJwSPckRlRD4bvmgDYjxCN rbEkGkMAoa6YgJc6z6XifwwgVXyWNXPm2h6fpvuJcQad3rqsRzzEa/tRW7iOPwSlATxtsN ZuJlnLzHK7FT6u/KTh2KfC1gimljtSKxmcp+WBPJoxU3y4sy0J6CKJmtDbfQH2p+ATfE95 mRCGk/7kGMT3KvR6+Czcx1EuX12LGRI/riuztcGH6BK+ORYYLuusY/IEcVbzuvd19Ej0Sw l3EgshG9+Wa3n/8GitO+RLB26JRdYC4/Wosmthr8p952xXLgsB0x9pHn3wrDpw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1671380451; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=1bXBignYNKw8DAFiXeUPG8NPWkfvGbBOpDeHmcvv8c8=; b=rQhuOQ8+dH86ylwTw7v5VQ25dkpqFwDVWCgL2/r0DMTHmej5HkgNbfAdVDDIx5TXFANN9M vRBJgV/pQ5K1QZpe+xvDxeB06a/w2FdcOsKoIwTOKlZV7wG2a+kWKaHbtEPbmEI/Of7m01 3ced7pQeVk2CqV7IqQ6McoFa3ajTvQ5GVL7g9knc7KQfobs9v5wLRzaQAH8lK0edI1mPLA 7JKhHT0PfMrM6lNNvWN6IFLp1M/3JhMey7OjcYQKMRx79lTYjVfFmHMQKa0CfapKeW3g78 w988wjnkpi8I6nGsnHrI7EIgLY3M3PIX3foWOBtHJXEwVDbT922Jw27IyiMEiA== ARC-Authentication-Results: i=1; mx1.freebsd.org; none ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1671380451; a=rsa-sha256; cv=none; b=KhseE7kq/Rkjkw2+cdof8hR4GDDuJmbTkIWTqcwZjsXPgyMrE76+dGE7xL2KQoQuUD4EeR Zb8fiJrGuCH+Q/MnDymn+xNXS9yvBposAtYjSkIBZ7B4WrAhjVCVS4rUGZ4+2rlBhrqbwB wIxwtjPBbEj0ZgH0u49Y6VC0FJM7tY1+GX/Wb3++TjIbfwxqzL0JNDtqN4SIdgcKZEGEoC Q4hBk9NhpBxIniV3BwZ0CR2iwFZZoTIWigpqxpLEAK6CThJm/cBx+nq72iCZZopM8rxzKq wsQDIaL7HHYJflJ4H/gy5sAAUrv1RaNLiWhV9zrRsVmRvuCw93BIG7UwcHhKpw== Received: from mx1.sbone.de (mx1.sbone.de [IPv6:2a01:4f8:13b:39f::9f:25]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "mx1.sbone.de", Issuer "SBone.DE" (not verified)) (Authenticated sender: bz/mail) by smtp.freebsd.org (Postfix) with ESMTPSA id 4NZp3k4p2gzHs2; Sun, 18 Dec 2022 16:20:50 +0000 (UTC) (envelope-from bz@freebsd.org) Received: from mail.sbone.de (mail.sbone.de [IPv6:fde9:577b:c1a9:4902:0:7404:2:1025]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.sbone.de (Postfix) with ESMTPS id 431888D4A228; Sun, 18 Dec 2022 16:20:49 +0000 (UTC) Received: from content-filter.t4-02.sbone.de (content-filter.t4-02.sbone.de [IPv6:fde9:577b:c1a9:4902:0:7404:2:2742]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mail.sbone.de (Postfix) with ESMTPS id CEC655C3A833; Sun, 18 Dec 2022 16:20:48 +0000 (UTC) X-Virus-Scanned: amavisd-new at sbone.de Received: from mail.sbone.de ([IPv6:fde9:577b:c1a9:4902:0:7404:2:1025]) by content-filter.t4-02.sbone.de (content-filter.t4-02.sbone.de [IPv6:fde9:577b:c1a9:4902:0:7404:2:2742]) (amavisd-new, port 10024) with ESMTP id sDqeDe9saDb2; Sun, 18 Dec 2022 16:20:46 +0000 (UTC) Received: from strong-iwl0.sbone.de (strong-iwl0.sbone.de [IPv6:fde9:577b:c1a9:4902:b66b:fcff:fef3:e3d2]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mail.sbone.de (Postfix) with ESMTPSA id 390CD5C3A830; Sun, 18 Dec 2022 16:20:46 +0000 (UTC) Date: Sun, 18 Dec 2022 16:20:45 +0000 (UTC) From: "Bjoern A. Zeeb" To: Zhenlei Huang cc: Gleb Smirnoff , "freebsd-jail@freebsd.org" Subject: Re: What's going on with vnets and epairs w/ addresses? In-Reply-To: <6B201617-68BC-4CC8-A2AE-908E96D69B67@FreeBSD.org> Message-ID: <4r8p3sn4-7no8-n2p2-9r16-n8sq3qs4p528@serrofq.bet> References: <5r22os7n-ro15-27q-r356-rps331o06so5@mnoonqbm.arg> <150A60D6-6757-46DD-988F-05A9FFA36821@FreeBSD.org> <9p9919q1-n639-p581-6q1o-so48o5ns6717@serrofq.bet> <6B201617-68BC-4CC8-A2AE-908E96D69B67@FreeBSD.org> X-OpenPGP-Key-Id: 0x14003F198FEFA3E77207EE8D2B58B8F83CCF1842 List-Id: Discussion about FreeBSD jail(8) List-Archive: https://lists.freebsd.org/archives/freebsd-jail List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-jail@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII; format=flowed X-ThisMailContainsUnwantedMimeParts: N On Sun, 18 Dec 2022, Zhenlei Huang wrote: > >> On Dec 18, 2022, at 3:23 AM, Bjoern A. Zeeb wrote: >> >> On Sat, 17 Dec 2022, Gleb Smirnoff wrote: >> >>> Zhenlei, >>> >>> On Fri, Dec 16, 2022 at 06:30:57PM +0800, Zhenlei Huang wrote: >>> Z> I managed to repeat this issue on CURRENT/14 with this small snip: >>> Z> >>> Z> ------------------------------------------- >>> Z> #!/bin/sh >>> Z> >>> Z> # test jail name >>> Z> n="test_ref_leak" >>> Z> >>> Z> jail -c name=$n path=/ vnet persist >>> Z> # The following line trigger jail pr_ref leak >>> Z> jexec $n ifconfig lo0 inet 127.0.0.1/8 >>> Z> >>> Z> jail -R $n >>> Z> >>> Z> # wait a moment >>> Z> sleep 1 >>> Z> >>> Z> jls -j $n >>> Z> >>> Z> After DDB debugging and tracing , it seems that is triggered by a combine of [1] and [2] >>> Z> >>> Z> [1] https://reviews.freebsd.org/rGfec8a8c7cbe4384c7e61d376f3aa5be5ac895915 >>> Z> [2] https://reviews.freebsd.org/rGeb93b99d698674e3b1cc7139fda98e2b175b8c5b > > I can confirm [2] also affects Non-VNET jails. > Prison pr_ref leak cause jail stuck in dying state. Usually a TCP connection in TW would do this in the old days and things would solve themselves after a while. This was always the case even long before vnet or multi-IP jails. >>> Z> >>> Z> >>> Z> In [1] the per-VNET uma zone is shared with the global one. >>> Z> `pcbinfo->ipi_zone = pcbstor->ips_zone;` >>> Z> >>> Z> In [2] unref `inp->inp_cred` is deferred called in inpcb_dtor() by uma_zfree_smr() . >>> Z> >>> Z> Unfortunately inps freed by uma_zfree_smr() are cached and inpcb_dtor() is not called immediately , >>> Z> thus leaking `inp->inp_cred` ref and hence `prison->pr_ref`. >>> Z> >>> Z> And it is also not possible to free up the cache by per-VNET SYSUNINIT tcp_destroy / udp_destroy / rip_destroy. >>> >>> This is known issue and I'd prefer not to call it a problem. The "leak" of a jail >>> happens only if machine is idle wrt the networking activity. >>> >>> Getting back to the problem that started this thread - the epair(4)s not immediately >>> popping back to prison0. IMHO, the problem again lies in the design of if_vmove and >>> epair(4) in particular. The if_vmove shall not exist, instead we should do a full >>> if_attach() and if_detach(). The state of an ifnet when it undergoes if_vmove doesn't >>> carry any useful information. With Alexander melifaro@ we discussed better options >>> for creating or attaching interfaces to jails that if_vmove. Until they are ready >>> the most easy workaround to deal with annoying epair(4) come back problem is to >>> remove it manually before destroying a jail, like I did in 80fc25025ff. >> >> Ok, move an em0 or cxl0 into the jail; the problem will be the same I >> bet and you need the physical interface to not disappear as then you >> cannot re-create a new jail with it. > > Re-read sys/kern/kern_jail.c, if pr_ref leaks, vnet_destroy() has no chance to be called, thus > if_vmove is not called and epair(4)s or em0, exl0 are not returned to home vnet. > > That can be confirmed by setting debug point on vnet_destroy by DDB, and then create and destroy vnet jails. > > So before the problem prison pr_ref count leaks is resolved, it will cover other potential problems such as @glebius > pointed out. > > I think the problem that prison ref count leaks should be resolved first. > > I'm also reviewing the life cycles of prison / vnet and it seems they could still be improved. But that's the not the problem here as your own test case pointed out. The point is that if you start a plain vnet jail put an interface in and destroy the jail that works instantly. The moment you put an address on any interface (incl. loopback as your test showed, which will not do ARP/NDP things compared to an ethernet interface) the jail will no longer die immediately. Simply putting an address on an interface should not defer things. So indeed something holds onto things there and is not cleaned up anymore. Finding that "something" is the important bit and being able to clean it up. I always say, if you have a machine in shutdown -r you don't want it hanging for hours either (now if you toggle the power switch you can do a lot more without panicing the rest of the system but with jails we cannot do that). And we did have vnet jails shutting down preoperly and clearing up for years. People had spent a lot of time on that. So it is possible and we need to get back to that state. /bz -- Bjoern A. Zeeb r15:7