From nobody Fri Feb 24 03:08:30 2023 X-Original-To: dev-commits-src-main@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4PNFGF1FrJz3t76r; Fri, 24 Feb 2023 03:08:37 +0000 (UTC) (envelope-from glebius@freebsd.org) Received: from glebi.us (glebi.us [162.251.186.162]) by mx1.freebsd.org (Postfix) with ESMTP id 4PNFGD21X2z4Jvs; Fri, 24 Feb 2023 03:08:36 +0000 (UTC) (envelope-from glebius@freebsd.org) Authentication-Results: mx1.freebsd.org; dkim=none; spf=softfail (mx1.freebsd.org: 162.251.186.162 is neither permitted nor denied by domain of glebius@freebsd.org) smtp.mailfrom=glebius@freebsd.org; dmarc=none Received: by glebi.us (Postfix, from userid 1000) id 4CC1B51206; Thu, 23 Feb 2023 19:08:30 -0800 (PST) Date: Thu, 23 Feb 2023 19:08:30 -0800 From: Gleb Smirnoff To: Rick Macklem Cc: Rick Macklem , bz@freebsd.org, jamie@freebsd.org, src-committers@freebsd.org, dev-commits-src-all@freebsd.org, dev-commits-src-main@freebsd.org Subject: Re: git: ef6fcc5e2b07 - main - nfsd: Add VNET_SYSUNINIT() macros for vnet cleanup Message-ID: References: <202302202112.31KLCfQB080359@gitrepo.freebsd.org> List-Id: Commit messages for the main branch of the src repository List-Archive: https://lists.freebsd.org/archives/dev-commits-src-main List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-dev-commits-src-main@freebsd.org X-BeenThere: dev-commits-src-main@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spamd-Result: default: False [0.64 / 15.00]; VIOLATED_DIRECT_SPF(3.50)[]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; NEURAL_HAM_SHORT(-0.86)[-0.860]; RCVD_NO_TLS_LAST(0.10)[]; MIME_GOOD(-0.10)[text/plain]; FREEMAIL_TO(0.00)[gmail.com]; MLMMJ_DEST(0.00)[dev-commits-src-all@freebsd.org,dev-commits-src-main@freebsd.org]; MIME_TRACE(0.00)[0:+]; R_DKIM_NA(0.00)[]; ASN(0.00)[asn:27348, ipnet:162.251.186.0/24, country:US]; RCPT_COUNT_SEVEN(0.00)[7]; FROM_EQ_ENVFROM(0.00)[]; ARC_NA(0.00)[]; TO_MATCH_ENVRCPT_SOME(0.00)[]; FROM_HAS_DN(0.00)[]; FREEFALL_USER(0.00)[glebius]; RCVD_COUNT_TWO(0.00)[2]; TO_DN_SOME(0.00)[]; DMARC_NA(0.00)[freebsd.org]; TAGGED_RCPT(0.00)[]; R_SPF_SOFTFAIL(0.00)[~all:c]; MID_RHS_MATCH_FROM(0.00)[] X-Rspamd-Queue-Id: 4PNFGD21X2z4Jvs X-Spamd-Bar: / X-ThisMailContainsUnwantedMimeParts: N Rick, On Thu, Feb 23, 2023 at 05:56:06AM -0800, Rick Macklem wrote: R> > This one actually doesn't look correct to me. What happens here is that the sysctls R> > will affect only the default VNET. R> > R> Yes, but the sysctls are mostly useless anyhow. I don't know how to make them R> work in a prison. (I know how to use SYSCTL_FLAG_VNET, but that does not R> work in this case.) Doesn't they work as intended in my patch D38742? R> > I think the VNET-itezation of this file went a bit wrong. You don't need to convert R> > static structures to malloced when you VNET-ize a module. The infrastructure should R> > take care of memory management. R> > R> Not if you want to keep the vnet footprint small. This was necessary R> so that nfsd.ko R> (and friends) will load dynamically. Without the conversion to R> mallocs, it would complain R> the vnet was out of memory when nfsd.ko tried to load. R> (I'm sure I didn't need to do all of them, but it made sense to keep R> the vnet footprint R> as small as possible.) R> > The dynamic sysctl context seems to be unneeded from the very beginning. It was R> > always attached to the static softc. Suggested patch: R> > R> > https://reviews.freebsd.org/D38742 R> > R> I'll look at it, but if it stops malloc'ng softc, then I will be R> worried w.r.t. vnet footprint and R> dynamic loading. (Note that the structure has an array of hash list R> pointers in it, so it R> is rather large. Replying to you and Bjoern's email too here. Well, if we want a fully blown virtualized network stack, and this is what VIMAGE is, then, well, we need a full chunk of memory to keep a network stack data. So, if there is a limit there, (Bjoern mentioned 8k) then this limit needs to be increased as more and more subsystems are virtualized. I also don't see how we actually save any memory using malloc(9) instead of using memory provided by VIMAGE? The kmem use would be roughly the same if not worse. What exactly are we saving here? R> Another reason (along with dynamic loading of the modules) for keeping R> the foot print R> small is to try and keep the jails that do not use nfsd(8) and friends R> lightweight. R> When I originally coded it, I put it under a kernel option called R> VNET_NFSD (still in R> kern_jail.c at this time), but others felt that a new kernel option R> wasn't desirable, R> (There is a lot of discussion under D37519. The downside of discussions that R> happen in phabricator is that not as many people see them. I started an email R> thread on freebsd-current@, but it quickly migrated to D37519. Just R> the way things R> currently happen.) I briefly looked into D37519 and didn't find any discussion over memory footprint and savings. I agree that a kernel option of VNET_NFSD is undesirable. The global option VIMAGE shall control that and nothing else. R> > For example nfsstatsv1, which is now malloced for non-default vnets and static for R> > the default. Lots of modules still incorrectly update the global one. R> > R> Nope. "struct nfsstatsv1" is a structure that is shared with the NFS R> client, which is R> not vnet'd. As such, a static "struct nfsstatsv1" needs to exist for prison0. R> R> My original solution was to create a separate structure for the server R> side stuff R> (vnet it), but then the result was a messy copying exercise for the system call, R> which returns the entire structure (with client and server info). R> R> Once the client side is vnet'd (I've already had a couple of emails asking me R> to do it, so I plan on starting to work on it) then, yes, the R> structure can become R> a malloc'd vnet'd one and the IS_DEFAULT_VNET can go away. I see the problem. So, we got several things that need to be done to nfsstatsv1: virtualize, separate server and client, possibly make them use counter(9) to avoid races and performance issues. And I'm afraid there is API/ABI that needs to be preserved? Depending on order of doing changes to nfsstatsv1 the amount of work is going to be different. So, if we start with my D38743 the only problem observed is that the client code will use virtual version of the stats. So if somebody decides to run a client from a vnet jail, it will fill this jail stats instead of global. Is it a big deal if the future plan is actually to make client virtualized too? R> I am not sure why you think IS_DEFAULT_VNET() is such a big deal in R> the initialization code? It work. I can see an argument for using both R> SYSINIT() and VNET_SYSINIT() for cases where you have both non-vnet;d R> and vnet'd data to initialize. However, I don't see any problem with using R> IS_DEFAULT_VNET() when needed. My main concern is complexity and its unforseen consequences. The original design of VIMAGE is simple - just wrap every global variable into VNET() macro and you are done. Don't make non-default vnet in any sense different to the default one! It actually works pretty good if the recipe is being followed. Sometimes this leads to pretty large mechanical changes, and people try to avoid that. For example in pf(4) we had quite a long story with getting it stable with VIMAGE, and I think that was because we just didn't do it right from the beginning. So with NFS we started to create complexity and we already got problems. The memory definitely is used after free, see: https://ci.freebsd.org/job/FreeBSD-main-amd64-test/23034/console -- Gleb Smirnoff