Re: Review D38047 ... and then there was one....

From: David E. Cross <david_at_crossfamilyweb.com>
Date: Sat, 12 Oct 2024 17:22:57 UTC
On 10/12/24 11:46, Mark Johnston wrote:
> And if all of the commits are pushed together, which they would be, then
> users will only see the end result anyway, so there is nothing
> inherently dangerous about splitting the change into easier-to-review
> commits.
>
> Even if the whole patch is applied atomically, it affects multiple
> binaries in the system (at least libc.so and nscd), and there's nothing
> guaranteeing that the update is somehow applied atomically across the
> whole system.  What if some applications using an older libc end up
> still running with an updated nscd?  That's an untested configuration,
> but it might still arise if the whole patch is committed as one unit.
>
> I would split it up like this:
> 1. Rename the cycle prevent symbol.  This is a cosmetic change which is
>     irrelevant to the correctness of the rest of the patch.
> 2. In nscd, fix the problem with negative lookups.  I can't easily see
>     if anything other than the query.c change is required there, which is
>     part of the reason I asked for the patch to be split up.
> 3. Use thread-safe lookups in nscd's passwd.c and group.c.  For the
>     latter, a bit of work is needed to tease out the bits that add
>     getgrouplist() handling.  I believe that consists of
>     wrap_getgrouplist() and the handling of nss_lt_pivot in
>     group_lookup_func().
> 4. Add getgrouplist() support to nscd.  At this point, libc doesn't
>     dispatch getgrouplist() to nscd, so this change has no effect.
> 5. Teach libc to marshall and unmarshall getgrouplist() requests.  This
>     could reasonably be combined with 4 since the logic is coupled.
> 6. Fix nscd to export the cycle prevention symbol.
>
> Just the exercise of reading through the patch again with an eye to
> splitting it up helped me understand it better and find a couple of
> buglets.  It can be split up mechanically using git add -p for the most
> part (aside from the fact that parts 3 and 4 touch nscd's group.c).
>
>  From my point of view, your patch hasn't gotten a lot of attention
> because
> 1. It's big and affects multiple binaries.
> 2. There are few committers familiar with our NSS implementation.
>
> The point of splitting it is to help with 1.  It requires more work from
> developers, but then reviewers are much more likely to read and actually
> understand it, which helps with 2 in the long run.
>
Ok, if this is just part of the review, and it all goes in at once my 
concerns are ameliorated.  I was afraid this was going to get piecemeal 
applied and then the bulk of the work, and the only real reason I did 
this was going to sit on a shelf; that is not terribly motivating to do 
a bunch of *additional* work

To address your points individually.

1. yes, and I see you correctly place the final step at 6 for this.

2. Of note on this fix, this is actually only required once the symbol 
is exported, it has no real effect prior to this, (you can see above at 
line 755 this is gated by the perform actual lookup check (which won't 
even work now).  So that is to say this doesn't break things any more 
than they are already broken (ie, not high priority)

3. This I do NOT believe is required.  I spent a lot of time going over 
the threading here; the various getpw and getgr entries already use 
thread local storage, so within a thread you are safe as long as you 
preserve values across calls as needed (and each thread within nscd 
always works to completion, even the lt_all ones).  I remember starting 
a bigger refactor to address that (since it was implemented 
inconsistently across different types). But I decided I wanted no part 
of that change.   Or are you referring to my code here; upon re-reading 
I am not sure if you are talking about pulling my agent code here, or if 
you think more thread hardening is required.

4/5: agreed

6: agreed as above.

If you could point me at the buglets I will work on addressing them (or 
discussing them, since I believe I am thread safe here), and I will also 
split this up.

Additionally I plan to tackle the other bug raised in -hackers about 
'getent' ALWAYS hitting ALL databses.. but I need to wrap my head around 
that more; it has been multiple years since I have been in this code and 
dusting off the cobwebs is taking time (another reason I wasn't big on 
splitting this up, I hadn't touched it in forever).


How do you recommend I post this, are these individual PRs on 
reviews.freebsd.org; do I post the diff differently?