Re: git: eb93b99d6986 - main - in_pcb: delay crfree() down into UMA dtor

From: Mark Johnston <markj_at_freebsd.org>
Date: Wed, 15 Dec 2021 02:35:42 UTC
On Tue, Dec 14, 2021 at 10:42:49PM +0100, Kristof Provost wrote:
> On 5 Dec 2021, at 19:47, Gleb Smirnoff wrote:
> > The branch main has been updated by glebius:
> >
> > URL: 
> > https://cgit.FreeBSD.org/src/commit/?id=eb93b99d698674e3b1cc7139fda98e2b175b8c5b
> >
> > commit eb93b99d698674e3b1cc7139fda98e2b175b8c5b
> > Author:     Gleb Smirnoff <glebius@FreeBSD.org>
> > AuthorDate: 2021-12-05 16:47:24 +0000
> > Commit:     Gleb Smirnoff <glebius@FreeBSD.org>
> > CommitDate: 2021-12-05 18:46:37 +0000
> >
> >     in_pcb: delay crfree() down into UMA dtor
> >
> >     inpcb lookups, which check inp_cred, work with pcbs that 
> > potentially went
> >     through in_pcbfree().  So inp_cred should stay valid until SMR 
> > guarantees
> >     its invisibility to lookups.
> >
> >     While here, put the whole inpcb destruction sequence of 
> > in_pcbfree(),
> >     inpcb_dtor() and inpcb_fini() sequentially.
> >
> >     Submitted by:           markj
> >     Differential revision:  https://reviews.freebsd.org/D33273
> >
> 
> For some reason it looks like this commit causes jails to fail to get 
> fully cleaned up.
> I can reproduce that trivially with `cd /usr/tests/sys/net ; kyua test 
> if_bridge_test:bridge_transmit_ipv4_unicast ; jls -na`.
> 
> Note the jails in dying state.
> 
> The jails created by that test never go away. It’s as if 
> `crfree(inp->inp_cred);` doesn’t actually get called. And indeed, it 
> looks like inpcb_dtor() does not get called at all.

For SMR zones it won't get called until there's some attempt to reuse a
bucket.  There's a fairly tight bound on this in the sense that there
can't be a large number of freed items waiting for the dtor to be
invoked, so the amount of consumed memory should be small, but there's
no bound on how long one might have to wait.  I didn't realize that jail
destruction could block waiting for a ucred reference to vanish. 

When I tried this patch with poudriere on a large system there was no
problem, but the regression test suite is a different story.  I'm not
sure how best to fix this yet.

> 
> Does
> 
> 
> Kristof