Re: git: 8271d9b99a3b - main - libsys: remove usage of pthread_once and _once_stub

From: Jessica Clarke <jrtc27_at_freebsd.org>
Date: Wed, 21 Feb 2024 17:23:10 UTC
On 21 Feb 2024, at 14:17, Konstantin Belousov <kostikbel@gmail.com> wrote:
> 
> On Wed, Feb 21, 2024 at 12:51:04AM +0000, Jessica Clarke wrote:
>> On 21 Feb 2024, at 00:29, Konstantin Belousov <kib@FreeBSD.org> wrote:
>>> 
>>> The branch main has been updated by kib:
>>> 
>>> URL: https://cgit.FreeBSD.org/src/commit/?id=8271d9b99a3b98c662ee9a6257a144284b7e1728
>>> 
>>> commit 8271d9b99a3b98c662ee9a6257a144284b7e1728
>>> Author:     Konstantin Belousov <kib@FreeBSD.org>
>>> AuthorDate: 2024-02-20 14:45:29 +0000
>>> Commit:     Konstantin Belousov <kib@FreeBSD.org>
>>> CommitDate: 2024-02-21 00:26:11 +0000
>>> 
>>>   libsys: remove usage of pthread_once and _once_stub
>>> 
>>>   that existed in auxv.c, use simple bool gate instead. This leaves a
>>>   small window if two threads try to call _elf_aux_info(3) simultaneously.
>>>   The situation is safe because auxv parsing is really idempotent. The
>>>   parsed data is the same, and we store atomic types (int/long/ptr) so
>>>   double-init does not matter.
>> 
>> You still need to load acquire and store release aux_once though,
>> otherwise you can see aux_once as true yet read the pre-initialised
>> data. In practice that’s surely very hard to hit, but the code as
>> written is now wrong. Also, idempotence should probably be made
>> unnecessary by using 0/1/2 state for uninitialised/initialising/
>> initialised, as it’s still technically UB from a C AM perspective due
>> to not being data race free if two threads initialise at the same time.
>> Better to just do the correct thing rather than risk things going wrong.
> 
> There is too much to handle 'in process' state for loosing thread, I need
> the whole libthr machinery.

What do you need libthr for? In pseudo-C:

x = load_acquire(&aux_once)
if (__predict_true(x == 2))
    return;
if (x == 1 || !compare_exchange_strong_acquire(&aux_once, &x, 1)) {
    while (x != 2) {
        yield();
        x = load_acquire(&aux_once)
    }
    return;
}
/* initialise as before */
store_release(&aux_once, 2);

I believe that’s all you need. Or compare exchange 0 to 1 as the
initial operation; makes the source code shorter at the expense of a
more expensive fast path:

x = 0;
if (__predict_true(!compare_exchange_strong_acquire(&aux_once, &x, 1)) {
    while (__predict_false(x != 2)) {
        yield();
        x = load_acquire(&aux_once)
    }
    return;
}
/* initialise as before */
store_release(&aux_once, 2);

I probably have bugs in the above, but you get the gist.

> I added the fences, thanks for noting.

Thanks.

Jess

> WRT being UB from pure C, we already have much more assumptions about
> atomicity.