Re: git: 5844bd058aed - main - jobc: rework detection of orphaned groups.

From: Konstantin Belousov <kostikbel_at_gmail.com>
Date: Wed, 03 May 2023 06:46:25 UTC
On Sun, Apr 30, 2023 at 11:01:44PM -0500, Kyle Evans wrote:
> On Sat, Jan 9, 2021 at 8:43 PM Konstantin Belousov <kib@freebsd.org> wrote:
> >
> > The branch main has been updated by kib:
> >
> > URL: https://cgit.FreeBSD.org/src/commit/?id=5844bd058aed6f3d0c8cbbddd6aa95993ece0189
> >
> > commit 5844bd058aed6f3d0c8cbbddd6aa95993ece0189
> > Author:     Konstantin Belousov <kib@FreeBSD.org>
> > AuthorDate: 2020-12-29 00:41:56 +0000
> > Commit:     Konstantin Belousov <kib@FreeBSD.org>
> > CommitDate: 2021-01-10 02:41:20 +0000
> >
> >     jobc: rework detection of orphaned groups.
> >
> >     Instead of trying to maintain pg_jobc counter on each process group
> >     update (and sometimes before), just calculate the counter when needed.
> >     Still, for the benefit of the signal delivery code, explicitly mark
> >     orphaned groups as such with the new process group flag.
> >
> >     This way we prevent bugs in the corner cases where updates to the counter
> >     were missed due to complicated configuration of p_pptr/p_opptr/real_parent
> >     (debugger).
> >
> >     Since we need to iterate over all children of the process on exit, this
> >     change mostly affects the process group entry and leave, where we need
> >     to iterate all process group members to detect orpaned status.
> >
> 
> Hi,
> 
> I have a question about the locking here...
> 
> > [... snip ...]
> > @@ -678,43 +677,40 @@ jobc_reaper(struct proc *p)
> >  }
> >
> >  static struct proc *
> > -jobc_parent(struct proc *p)
> > +jobc_parent(struct proc *p, struct proc *p_exiting)
> >  {
> >         struct proc *pp;
> >
> > -       sx_assert(&proctree_lock, SX_LOCKED);
> > +       sx_assert(&proctree_lock, SA_LOCKED);
> >
> >         pp = proc_realparent(p);
> > -       if (pp->p_pptr == NULL ||
> > +       if (pp->p_pptr == NULL || pp == p_exiting ||
> >             (pp->p_treeflag & P_TREE_GRPEXITED) == 0)
> >                 return (pp);
> >         return (jobc_reaper(pp));
> >  }
> >
> > -#ifdef INVARIANTS
> > -static void
> > -check_pgrp_jobc(struct pgrp *pgrp)
> > +static int
> > +pgrp_calc_jobc(struct pgrp *pgrp)
> >  {
> >         struct proc *q;
> >         int cnt;
> >
> > -       sx_assert(&proctree_lock, SX_LOCKED);
> > -       PGRP_LOCK_ASSERT(pgrp, MA_NOTOWNED);
> > +#ifdef INVARIANTS
> > +       if (!mtx_owned(&pgrp->pg_mtx))
> > +               sx_assert(&proctree_lock, SA_LOCKED);
> > +#endif
> >
> >         cnt = 0;
> > -       PGRP_LOCK(pgrp);
> >         LIST_FOREACH(q, &pgrp->pg_members, p_pglist) {
> >                 if ((q->p_treeflag & P_TREE_GRPEXITED) != 0 ||
> >                     q->p_pptr == NULL)
> >                         continue;
> > -               if (isjobproc(jobc_parent(q), pgrp))
> > +               if (isjobproc(jobc_parent(q, NULL), pgrp))
> >                         cnt++;
> >         }
> > -       KASSERT(pgrp->pg_jobc == cnt, ("pgrp %d %p pg_jobc %d cnt %d",
> > -           pgrp->pg_id, pgrp, pgrp->pg_jobc, cnt));
> > -       PGRP_UNLOCK(pgrp);
> > +       return (cnt);
> >  }
> > -#endif
> >
> >  /*
> >   * Move p to a process group
> > [... snip ...]
> 
> So the proctree lock is sufficient for pgrp_calc_jobc() to provide a
> stable answer needed for everything fixjobc_kill() uses it for...
> 
> > @@ -934,35 +827,46 @@ fixjobc_kill(struct proc *p)
> >         p->p_treeflag |= P_TREE_GRPEXITED;
> >
> >         /*
> > -        * Check p's parent to see whether p qualifies its own process
> > -        * group; if so, adjust count for p's process group.
> > +        * Check if exiting p orphans its own group.
> >          */
> > -       if (isjobproc(jobc_parent(p), pgrp))
> > -               pgadjustjobc(pgrp, false);
> > +       pgrp = p->p_pgrp;
> > +       if (isjobproc(jobc_parent(p, NULL), pgrp)) {
> > +               PGRP_LOCK(pgrp);
> > +               if (pgrp_calc_jobc(pgrp) == 0)
> > +                       orphanpg(pgrp);
> > +               PGRP_UNLOCK(pgrp);
> > +       }
> >
> >         /*
> >          * Check this process' children to see whether they qualify
> > -        * their process groups after reparenting to reaper.  If so,
> > -        * adjust counts for children's process groups.
> > +        * their process groups after reparenting to reaper.
> >          */
> >         LIST_FOREACH(q, &p->p_children, p_sibling) {
> > -               if ((q->p_treeflag & P_TREE_ORPHANED) != 0)
> > -                       continue;
> > -               fixjobc_kill_q(p, q, true);
> > +               pgrp = q->p_pgrp;
> > +               PGRP_LOCK(pgrp);
> > +               if (pgrp_calc_jobc(pgrp) == 0) {
> > +                       /*
> > +                        * We want to handle exactly the children that
> > +                        * has p as realparent.  Then, when calculating
> > +                        * jobc_parent for children, we should ignore
> > +                        * P_TREE_GRPEXITED flag already set on p.
> > +                        */
> > +                       if (jobc_parent(q, p) == p && isjobproc(p, pgrp))
> > +                               orphanpg(pgrp);
> > +               } else
> > +                       pgrp->pg_flags &= ~PGRP_ORPHANED;
> > +               PGRP_UNLOCK(pgrp);
> >         }
> > -       LIST_FOREACH(q, &p->p_orphans, p_orphan)
> > -               fixjobc_kill_q(p, q, true);
> > -       LIST_FOREACH(q, &p->p_children, p_sibling) {
> > -               if ((q->p_treeflag & P_TREE_ORPHANED) != 0)
> > -                       continue;
> > -               fixjobc_kill_q(p, q, false);
> > +       LIST_FOREACH(q, &p->p_orphans, p_orphan) {
> > +               pgrp = q->p_pgrp;
> > +               PGRP_LOCK(pgrp);
> > +               if (pgrp_calc_jobc(pgrp) == 0) {
> > +                       if (isjobproc(p, pgrp))
> > +                               orphanpg(pgrp);
> > +               } else
> > +                       pgrp->pg_flags &= ~PGRP_ORPHANED;
> > +               PGRP_UNLOCK(pgrp);
> >         }
> > -       LIST_FOREACH(q, &p->p_orphans, p_orphan)
> > -               fixjobc_kill_q(p, q, false);
> > -
> > -#ifdef INVARIANTS
> > -       check_pgrp_jobc(pgrp);
> > -#endif
> >  }
> >
> >  void
> 
> ... and I would imagine the proctree lock is sufficient for
> isjobproc/jobc_parent as well- is there any reason we can't/shouldn't
> just use atomic(9) for operations with pgrp->pg_flags and push all of
> the extra lock acquisitions into the orphanpg() cases? It seems like
> we could avoid taking any pgrp lock in the common case and at least
> mitigate that additional overhead from the exit() path.

My reading is that you are proposing change pgrp locks into
proctree_lock. Flag (PGRP_ORPHANED) needs to be consistent with the
group membership, so it cannot be atomic, it would need to be under the
same lock.

If this works, I do not object, but I did not looked carefully.  The most
obvious problem could be that it is not easy to take sx proctree_lock in
some places that only need pgrp mutex right now.