Re: Is powerpc64 atomic_load_acq_##TYPE omitting isync believed correct?
Date: Mon, 31 May 2021 11:23:04 UTC
On 2021-May-29, at 23:04, Mark Millard <marklmi at yahoo.com> wrote: > In the code from /usr/include/machine/atomic.h for powerpc64 > and powerpc there is: > > #define ATOMIC_STORE_LOAD(TYPE) \ > static __inline u_##TYPE \ > atomic_load_acq_##TYPE(volatile u_##TYPE *p) \ > { \ > u_##TYPE v; \ > \ > v = *p; \ > powerpc_lwsync(); \ > return (v); \ > } \ > \ > static __inline void \ > atomic_store_rel_##TYPE(volatile u_##TYPE *p, u_##TYPE v) \ > { \ > \ > powerpc_lwsync(); \ > *p = v; \ > } > > This code sequence does not involve isync: > > #define __ATOMIC_ACQ() __asm __volatile("isync" : : : "memory") > > What justifies this? All the reference material I've > found for C++/C11 semantics agrees with: > > https://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html > > that shows (organized here to compare Relaxed vs. > Acquire and Release): > > powerpc Load Relaxed vs. Acquire: ld vs. ld;cmp;bc;isync > powerpc Fence: Acquire: lwsync > powerpc Store Relaxed vs. Release: st vs. "Fence: Release";st > powerpc Fence: Release: lwsync > > lwsync does not order prior stores vs. later loads, isync does > (and more in some respects). That likely (partially) explains > why load-acquire does not use just an acquire-fence in such > materials. > > Is this a problem for being correct for "synchronizes with" in > "man atomic"? For the acquire operation reading the value > written by the release operation: > > QUOTE > . . . the effects of all > prior stores by the releasing thread must become visible to subsequent > loads by the acquiring thread > END QUOTE > > It seems that some later loads could be moved by the hardware > to be too early relative to various such prior stores (as seen > in the load-acquire thread): no constraint is placed for such > relationships by the atomic_load_acq_##TYPE as far as I can see. > > > (I got into this by finding some code that uses an > atomic_store_rel_##TYPE without any matching use of > atomic_load_acq_##TYPE or atomic_thread_fence_acq or other such, > so far as I found. But, looking around to see if I could find a > justification for such code, generated more questions, such as > in this note.) Never mind. I figured out my significant confusion in interpretation. (Net result: lwsync is more than sufficient.) === Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar)