From nobody Sun Dec 18 06:10:03 2022 X-Original-To: freebsd-jail@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4NZXX8281Cz1GKk1 for ; Sun, 18 Dec 2022 06:11:04 +0000 (UTC) (envelope-from zlei.huang@gmail.com) Received: from mail-pj1-x1034.google.com (mail-pj1-x1034.google.com [IPv6:2607:f8b0:4864:20::1034]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4NZXX8037Yz4G8X; Sun, 18 Dec 2022 06:11:04 +0000 (UTC) (envelope-from zlei.huang@gmail.com) Authentication-Results: mx1.freebsd.org; none Received: by mail-pj1-x1034.google.com with SMTP id fy4so6242946pjb.0; Sat, 17 Dec 2022 22:11:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:subject:mime-version:from:from:to:cc:subject:date :message-id:reply-to; bh=YSCClxagI+cuEbmy9D2fCm7BvTqHt8IxALefIsLUmWQ=; b=SVw2jshqAdq71NlnN6894zo2kdY3m8tHuSCg20Op7iBqaIiCp0iqlxd2G2K9HDLtUA pDhICGNl6EU3PUdg5jfexejbCaNJzWT1rLnllhA33RSwW3wi3xQUY16UoUYsAK7bbWrt OKEnYeyD9wWDyVrnhmws2raFOuasiSYqYrclqsomCs6LVD7DeVCziJFiOx1jbKmx66Q1 E0YX2DtSZLvflNbcglqv6XOskz2BKJjdd6HbAJqObMJjMUVBbEAZGmKCltHrjJQnfilV 9OvLGVfLQAPaYFJiRwgt71tyRq1+SQpTlxpK1UhPYV2pL1NJZu9I1RXSv3P6FWn0Poj2 46Sg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:subject:mime-version:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=YSCClxagI+cuEbmy9D2fCm7BvTqHt8IxALefIsLUmWQ=; b=cfoHfbqcYHiXjKgJzN+4EWjV++Xrt0kFZj1QotHPZaiCIpy6K2+iFYKe1DILGYgWOV DrlgYvIhcT0casbjKQ7cVxa2KFxWs4dpbZXk7cttCLSQP8pKtjyaMRfSKrgHBasF/bhU wPwTXXmw1ttfnLSLSe7gHzR/L0XOnoMo8YOPu5se4kSvBUlpknkNyG1almVIxwdI5tm9 TnZ7RR/Jvaociab/2reEoMoowzfzyLsrv7Mneu/bMQQA2/CcbzmA29TnIz1nsYqktYNv xznnp200x1gRA2rZoah9MG2J3/P4pU3VGzsNM4We07cQ/XyDPHB4hRCS4UU1c3Hq/W7n /Byw== X-Gm-Message-State: ANoB5pnM20n/BsqkocZwLJnH6ir2iUJgBMH6XN0+ZqfPvy9GH/0y+Swk 4677uqZBW8QQDEaoSK6T/noYSSZck9FxW+0R X-Google-Smtp-Source: AA0mqf70Wr/IoJ2FEPwetzF6rwZk9zB4dSheklyMoiNelbOgHpiqdq0qvqOjG+YRfvVonUdTPJJG0Q== X-Received: by 2002:a17:902:a711:b0:189:747e:97cc with SMTP id w17-20020a170902a71100b00189747e97ccmr36006668plq.26.1671343862402; Sat, 17 Dec 2022 22:11:02 -0800 (PST) Received: from [172.17.252.129] (ns1.oxydns.net. [45.32.91.63]) by smtp.gmail.com with ESMTPSA id m10-20020a170902db0a00b0017f72a430adsm4458917plx.71.2022.12.17.22.11.00 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Sat, 17 Dec 2022 22:11:01 -0800 (PST) From: Zhenlei Huang X-Google-Original-From: Zhenlei Huang Content-Type: text/plain; charset=us-ascii List-Id: Discussion about FreeBSD jail(8) List-Archive: https://lists.freebsd.org/archives/freebsd-jail List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-jail@freebsd.org Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.120.23.2.7\)) Subject: Re: What's going on with vnets and epairs w/ addresses? In-Reply-To: <9p9919q1-n639-p581-6q1o-so48o5ns6717@serrofq.bet> Date: Sun, 18 Dec 2022 14:10:03 +0800 Cc: Gleb Smirnoff , "freebsd-jail@freebsd.org" Content-Transfer-Encoding: quoted-printable Message-Id: <6B201617-68BC-4CC8-A2AE-908E96D69B67@FreeBSD.org> References: <5r22os7n-ro15-27q-r356-rps331o06so5@mnoonqbm.arg> <150A60D6-6757-46DD-988F-05A9FFA36821@FreeBSD.org> <9p9919q1-n639-p581-6q1o-so48o5ns6717@serrofq.bet> To: "Bjoern A. Zeeb" X-Mailer: Apple Mail (2.3608.120.23.2.7) X-Rspamd-Queue-Id: 4NZXX8037Yz4G8X X-Spamd-Bar: ---- X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US]; TAGGED_FROM(0.00)[] X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-ThisMailContainsUnwantedMimeParts: N > On Dec 18, 2022, at 3:23 AM, Bjoern A. Zeeb wrote: >=20 > On Sat, 17 Dec 2022, Gleb Smirnoff wrote: >=20 >> Zhenlei, >>=20 >> On Fri, Dec 16, 2022 at 06:30:57PM +0800, Zhenlei Huang wrote: >> Z> I managed to repeat this issue on CURRENT/14 with this small snip: >> Z> >> Z> ------------------------------------------- >> Z> #!/bin/sh >> Z> >> Z> # test jail name >> Z> n=3D"test_ref_leak" >> Z> >> Z> jail -c name=3D$n path=3D/ vnet persist >> Z> # The following line trigger jail pr_ref leak >> Z> jexec $n ifconfig lo0 inet 127.0.0.1/8 >> Z> >> Z> jail -R $n >> Z> >> Z> # wait a moment >> Z> sleep 1 >> Z> >> Z> jls -j $n >> Z> >> Z> After DDB debugging and tracing , it seems that is triggered by a = combine of [1] and [2] >> Z> >> Z> [1] = https://reviews.freebsd.org/rGfec8a8c7cbe4384c7e61d376f3aa5be5ac895915 = >> Z> [2] = https://reviews.freebsd.org/rGeb93b99d698674e3b1cc7139fda98e2b175b8c5b = I can confirm [2] also affects Non-VNET jails. Prison pr_ref leak cause jail stuck in dying state. >> Z> >> Z> >> Z> In [1] the per-VNET uma zone is shared with the global one. >> Z> `pcbinfo->ipi_zone =3D pcbstor->ips_zone;` >> Z> >> Z> In [2] unref `inp->inp_cred` is deferred called in inpcb_dtor() by = uma_zfree_smr() . >> Z> >> Z> Unfortunately inps freed by uma_zfree_smr() are cached and = inpcb_dtor() is not called immediately , >> Z> thus leaking `inp->inp_cred` ref and hence `prison->pr_ref`. >> Z> >> Z> And it is also not possible to free up the cache by per-VNET = SYSUNINIT tcp_destroy / udp_destroy / rip_destroy. >>=20 >> This is known issue and I'd prefer not to call it a problem. The = "leak" of a jail >> happens only if machine is idle wrt the networking activity. >>=20 >> Getting back to the problem that started this thread - the epair(4)s = not immediately >> popping back to prison0. IMHO, the problem again lies in the design = of if_vmove and >> epair(4) in particular. The if_vmove shall not exist, instead we = should do a full >> if_attach() and if_detach(). The state of an ifnet when it undergoes = if_vmove doesn't >> carry any useful information. With Alexander melifaro@ we discussed = better options >> for creating or attaching interfaces to jails that if_vmove. Until = they are ready >> the most easy workaround to deal with annoying epair(4) come back = problem is to >> remove it manually before destroying a jail, like I did in = 80fc25025ff. >=20 > Ok, move an em0 or cxl0 into the jail; the problem will be the same I > bet and you need the physical interface to not disappear as then you > cannot re-create a new jail with it. Re-read sys/kern/kern_jail.c, if pr_ref leaks, vnet_destroy() has no = chance to be called, thus if_vmove is not called and epair(4)s or em0, exl0 are not returned to = home vnet. That can be confirmed by setting debug point on vnet_destroy by DDB, and = then create and destroy vnet jails. So before the problem prison pr_ref count leaks is resolved, it will = cover other potential problems such as @glebius pointed out. I think the problem that prison ref count leaks should be resolved = first. I'm also reviewing the life cycles of prison / vnet and it seems they = could still be improved. >=20 > /bz >=20 > --=20 > Bjoern A. Zeeb = r15:7