Re: git: 5844bd058aed - main - jobc: rework detection of orphaned groups.
- In reply to: Kyle Evans : "Re: git: 5844bd058aed - main - jobc: rework detection of orphaned groups."
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Wed, 03 May 2023 06:46:25 UTC
On Sun, Apr 30, 2023 at 11:01:44PM -0500, Kyle Evans wrote: > On Sat, Jan 9, 2021 at 8:43 PM Konstantin Belousov <kib@freebsd.org> wrote: > > > > The branch main has been updated by kib: > > > > URL: https://cgit.FreeBSD.org/src/commit/?id=5844bd058aed6f3d0c8cbbddd6aa95993ece0189 > > > > commit 5844bd058aed6f3d0c8cbbddd6aa95993ece0189 > > Author: Konstantin Belousov <kib@FreeBSD.org> > > AuthorDate: 2020-12-29 00:41:56 +0000 > > Commit: Konstantin Belousov <kib@FreeBSD.org> > > CommitDate: 2021-01-10 02:41:20 +0000 > > > > jobc: rework detection of orphaned groups. > > > > Instead of trying to maintain pg_jobc counter on each process group > > update (and sometimes before), just calculate the counter when needed. > > Still, for the benefit of the signal delivery code, explicitly mark > > orphaned groups as such with the new process group flag. > > > > This way we prevent bugs in the corner cases where updates to the counter > > were missed due to complicated configuration of p_pptr/p_opptr/real_parent > > (debugger). > > > > Since we need to iterate over all children of the process on exit, this > > change mostly affects the process group entry and leave, where we need > > to iterate all process group members to detect orpaned status. > > > > Hi, > > I have a question about the locking here... > > > [... snip ...] > > @@ -678,43 +677,40 @@ jobc_reaper(struct proc *p) > > } > > > > static struct proc * > > -jobc_parent(struct proc *p) > > +jobc_parent(struct proc *p, struct proc *p_exiting) > > { > > struct proc *pp; > > > > - sx_assert(&proctree_lock, SX_LOCKED); > > + sx_assert(&proctree_lock, SA_LOCKED); > > > > pp = proc_realparent(p); > > - if (pp->p_pptr == NULL || > > + if (pp->p_pptr == NULL || pp == p_exiting || > > (pp->p_treeflag & P_TREE_GRPEXITED) == 0) > > return (pp); > > return (jobc_reaper(pp)); > > } > > > > -#ifdef INVARIANTS > > -static void > > -check_pgrp_jobc(struct pgrp *pgrp) > > +static int > > +pgrp_calc_jobc(struct pgrp *pgrp) > > { > > struct proc *q; > > int cnt; > > > > - sx_assert(&proctree_lock, SX_LOCKED); > > - PGRP_LOCK_ASSERT(pgrp, MA_NOTOWNED); > > +#ifdef INVARIANTS > > + if (!mtx_owned(&pgrp->pg_mtx)) > > + sx_assert(&proctree_lock, SA_LOCKED); > > +#endif > > > > cnt = 0; > > - PGRP_LOCK(pgrp); > > LIST_FOREACH(q, &pgrp->pg_members, p_pglist) { > > if ((q->p_treeflag & P_TREE_GRPEXITED) != 0 || > > q->p_pptr == NULL) > > continue; > > - if (isjobproc(jobc_parent(q), pgrp)) > > + if (isjobproc(jobc_parent(q, NULL), pgrp)) > > cnt++; > > } > > - KASSERT(pgrp->pg_jobc == cnt, ("pgrp %d %p pg_jobc %d cnt %d", > > - pgrp->pg_id, pgrp, pgrp->pg_jobc, cnt)); > > - PGRP_UNLOCK(pgrp); > > + return (cnt); > > } > > -#endif > > > > /* > > * Move p to a process group > > [... snip ...] > > So the proctree lock is sufficient for pgrp_calc_jobc() to provide a > stable answer needed for everything fixjobc_kill() uses it for... > > > @@ -934,35 +827,46 @@ fixjobc_kill(struct proc *p) > > p->p_treeflag |= P_TREE_GRPEXITED; > > > > /* > > - * Check p's parent to see whether p qualifies its own process > > - * group; if so, adjust count for p's process group. > > + * Check if exiting p orphans its own group. > > */ > > - if (isjobproc(jobc_parent(p), pgrp)) > > - pgadjustjobc(pgrp, false); > > + pgrp = p->p_pgrp; > > + if (isjobproc(jobc_parent(p, NULL), pgrp)) { > > + PGRP_LOCK(pgrp); > > + if (pgrp_calc_jobc(pgrp) == 0) > > + orphanpg(pgrp); > > + PGRP_UNLOCK(pgrp); > > + } > > > > /* > > * Check this process' children to see whether they qualify > > - * their process groups after reparenting to reaper. If so, > > - * adjust counts for children's process groups. > > + * their process groups after reparenting to reaper. > > */ > > LIST_FOREACH(q, &p->p_children, p_sibling) { > > - if ((q->p_treeflag & P_TREE_ORPHANED) != 0) > > - continue; > > - fixjobc_kill_q(p, q, true); > > + pgrp = q->p_pgrp; > > + PGRP_LOCK(pgrp); > > + if (pgrp_calc_jobc(pgrp) == 0) { > > + /* > > + * We want to handle exactly the children that > > + * has p as realparent. Then, when calculating > > + * jobc_parent for children, we should ignore > > + * P_TREE_GRPEXITED flag already set on p. > > + */ > > + if (jobc_parent(q, p) == p && isjobproc(p, pgrp)) > > + orphanpg(pgrp); > > + } else > > + pgrp->pg_flags &= ~PGRP_ORPHANED; > > + PGRP_UNLOCK(pgrp); > > } > > - LIST_FOREACH(q, &p->p_orphans, p_orphan) > > - fixjobc_kill_q(p, q, true); > > - LIST_FOREACH(q, &p->p_children, p_sibling) { > > - if ((q->p_treeflag & P_TREE_ORPHANED) != 0) > > - continue; > > - fixjobc_kill_q(p, q, false); > > + LIST_FOREACH(q, &p->p_orphans, p_orphan) { > > + pgrp = q->p_pgrp; > > + PGRP_LOCK(pgrp); > > + if (pgrp_calc_jobc(pgrp) == 0) { > > + if (isjobproc(p, pgrp)) > > + orphanpg(pgrp); > > + } else > > + pgrp->pg_flags &= ~PGRP_ORPHANED; > > + PGRP_UNLOCK(pgrp); > > } > > - LIST_FOREACH(q, &p->p_orphans, p_orphan) > > - fixjobc_kill_q(p, q, false); > > - > > -#ifdef INVARIANTS > > - check_pgrp_jobc(pgrp); > > -#endif > > } > > > > void > > ... and I would imagine the proctree lock is sufficient for > isjobproc/jobc_parent as well- is there any reason we can't/shouldn't > just use atomic(9) for operations with pgrp->pg_flags and push all of > the extra lock acquisitions into the orphanpg() cases? It seems like > we could avoid taking any pgrp lock in the common case and at least > mitigate that additional overhead from the exit() path. My reading is that you are proposing change pgrp locks into proctree_lock. Flag (PGRP_ORPHANED) needs to be consistent with the group membership, so it cannot be atomic, it would need to be under the same lock. If this works, I do not object, but I did not looked carefully. The most obvious problem could be that it is not easy to take sx proctree_lock in some places that only need pgrp mutex right now.