Is powerpc64 atomic_load_acq_##TYPE omitting isync believed correct?

From: Mark Millard via freebsd-hackers <freebsd-hackers_at_freebsd.org>
Date: Sun, 30 May 2021 06:04:39 UTC
In the code from /usr/include/machine/atomic.h for powerpc64
and powerpc there is:

#define ATOMIC_STORE_LOAD(TYPE)                                 \
static __inline u_##TYPE                                        \
atomic_load_acq_##TYPE(volatile u_##TYPE *p)                    \
{                                                               \
        u_##TYPE v;                                             \
                                                                \
        v = *p;                                                 \
        powerpc_lwsync();                                       \
        return (v);                                             \
}                                                               \
                                                                \
static __inline void                                            \
atomic_store_rel_##TYPE(volatile u_##TYPE *p, u_##TYPE v)       \
{                                                               \
                                                                \
        powerpc_lwsync();                                       \
        *p = v;                                                 \
}

This code sequence does not involve isync:

#define __ATOMIC_ACQ()  __asm __volatile("isync" : : : "memory")

What justifies this? All the reference material I've
found for C++/C11 semantics agrees with:

https://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html

that shows (organized here to compare Relaxed vs.
Acquire and Release):

powerpc Load  Relaxed vs. Acquire: ld vs. ld;cmp;bc;isync
powerpc Fence:            Acquire: lwsync
powerpc Store Relaxed vs. Release: st vs. "Fence: Release";st
powerpc Fence:            Release: lwsync

lwsync does not order prior stores vs. later loads, isync does
(and more in some respects). That likely (partially) explains
why load-acquire does not use just an acquire-fence in such
materials.

Is this a problem for being correct for "synchronizes with" in
"man atomic"? For the acquire operation reading the value
written by the release operation:

QUOTE
     . . . the effects of all
     prior stores by the releasing thread must become visible to subsequent
     loads by the acquiring thread
END QUOTE

It seems that some later loads could be moved by the hardware
to be too early relative to various such prior stores (as seen
in the load-acquire thread): no constraint is placed for such
relationships by the atomic_load_acq_##TYPE as far as I can see.


(I got into this by finding some code that uses an
atomic_store_rel_##TYPE without any matching use of
atomic_load_acq_##TYPE or atomic_thread_fence_acq or other such,
so far as I found. But, looking around to see if I could find a
justification for such code, generated more questions, such as
in this note.)

===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)