Re: Review D38047 ... and then there was one....

From: David E. Cross <david_at_crossfamilyweb.com>
Date: Tue, 15 Oct 2024 21:12:15 UTC
Ok, I've now tested this with a full buildworld with the attached 
patch.  This will be being updated on D38047 immediately after I send 
this email; and then for the point of the reviews and getting this 
formally into base I will be doing the requested changes of breaking it 
down to smaller, but I figured this should give you a taste of the 
improvements: (this is with 1000 groups and users defined)


~ $ time getent group >/dev/null

         0.26 real         0.00 user         0.01 sys

~ $ time getent group >/dev/null

         0.02 real         0.01 user         0.00 sys


On 10/13/24 14:21, David E. Cross wrote:
>
> I have now fixed this as well.
>
> WARNING, I did this as a manual patch to my tree and then put it back 
> in git, I have not tried this EXACT code as is; I am working on some 
> other changes  as well so things are a bit in flux; I will attach the 
> diff to this, but caveat emptor.
>
> The patch given is to be applied incrementally after the last patch.  
> let met know how it works (or if it works!) for you.
>
> On 10/8/24 16:17, David E. Cross wrote:
>>
>> Ok, I looked a bit into this and for the case of 'getent *' it really 
>> is not (currently) a fair comparison to speed.
>>
>> For 'getent password' the system currently works as follows, for each 
>> datasource in the list fully iterate over EVERY datasource, and 
>> 'cache' is a datasource, but so is ldap.
>>
>> What you wind up getting is a list of EVERYTHNG in files, then a list 
>> of everything in cache, and then a list of everything in LDAP. (or 
>> whatever).   SO every time it will always go back to origin, so 
>> caching effectively doesn't matter except to duplicate the data.
>>
>> I remember this when I was doing the initial development and I looked 
>> into ways to NOT have it do it but for some reason I didn't think it 
>> was possible without a substantial rewrite, I am taking another look 
>> to see if that is still true or if there is a way around it.
>>
>> Going on my vague (it has been multiple years now), I think in the 
>> GENERAL case it is unavoidable.  The way NSCD typically operates is 
>> that looked up values are PUSHED into the cache from the client.  
>> That is the client says 'do you have X'? nscd replies 'no', then the 
>> CLIENT falls back, does the lookup, get the value and pushes it into 
>> nscd.  nscd additionally has a 'perform_lookups' flag that will have 
>> it do the lookup itself and then tell the client the result.  The 
>> interaction of this variable behavior is that there is no way to 
>> programatically shortcircuit it without libc knowing how nscd is 
>> optionally configured.  If libc knew that nscd would perform the 
>> lookups itself then it could for getent type calls just return 
>> immediately after the cache layer enumeration.  if libc knew that 
>> nscd would NOT perform lookups then it could bypass it and do the normal.
>>
>>
>> I guess I could implement it as follows:
>>
>> nscd retruns NS_SUCCESS if it performs its own lookups and then in 
>> the case of getent NS_SUCCESS is treated as a return step for the 
>> cache layer only (since otherwise getent calls are treated as 
>> continue otherwise you'd never enumerate anything after files). and 
>> NS_NOTFOUND if it doesn't.. and then the libc layer would treat that 
>> as a continue.  .. I think that may do it... I need to refamiliarize 
>> myself with that code.
>>
>>
>> In the meantime, checking basic lookups (not enumerations) is a more 
>> fair test.  Also keep in mind that without [notfound=return] that 
>> misses will always fall back to origin, which is probably what you 
>> want with nscd in the default configuration, but not with nscd doing 
>> its own lookups.
>>
>> On 10/7/24 11:33, Marek Zarychta wrote:
>>>
>>> W dniu 7.10.2024 o 07:05, David Cross pisze:
>>>
>>>> How many entries are in your ldap structure?  I can attempt a replication here
>>>
>>> Hello David,
>>>
>>> I will rather not expose it publicly. Whole LDAP directory contains 
>>> few thousand entries - and it was was used for the tests mentioned 
>>> in this thread.
>>>
>>> With the filters applied I see below 1k entries, and then lookup 
>>> with nsdc running takes: first lookup 0.16s, next lookups 0.09s, 
>>> while without nscd it varies from 0.12 to 0.08 - so nscd performs OK.
>>>
>>> I have your patch applied and I am still testing it with 
>>> net/nss-pam-ldapd from ports with patch for login classes applied 
>>> (it's present in port but not enabled by default). So far it works 
>>> without issues.
>>>
>>> -- 
>>> Marek Zarychta