Undefined reference to __atomic_store_8

Wed Aug 12 11:42:04 UTC 2020

On Wed, 12 Aug 2020 09:44:25 +0400 Gleb Popov <arrowd at freebsd.org> wrote:
> On Wed, Aug 12, 2020 at 9:21 AM Gleb Popov <arrowd at freebsd.org> wrote:
>> Indeed, this looks like a culprit! When compiling using first command line
>> (the long one) I get following warnings:
>>
>> /wrkdirs/usr/ports/lang/ghc/work/ghc-8.10.1/libraries/ghc-prim/cbits/atomic.c:369:10:
>> warning: misaligned atomic operation may incur significant performance
>> penalty [-Watomic-alignment]
>>   return __atomic_load_n((StgWord64 *) x, __ATOMIC_SEQ_CST);
>>          ^
>> /wrkdirs/usr/ports/lang/ghc/work/ghc-8.10.1/libraries/ghc-prim/cbits/atomic.c:417:3:
>> warning: misaligned atomic operation may incur significant performance
>> penalty [-Watomic-alignment]
>>   __atomic_store_n((StgWord64 *) x, (StgWord64) val, __ATOMIC_SEQ_CST);
>>   ^
>> 2 warnings generated.
>>
>> I guess this basically means "I'm emitting a call there". So, what's the
>> correct fix in this case?  
> 
> I just noticed that Clang emits these warnings (and the call instruction)
> only for functions handling StgWord64 type. For the same code with
> StgWord32, like
> 
> StgWord
> hs_atomicread32(StgWord x)
> {
> #if HAVE_C11_ATOMICS
>   return __atomic_load_n((StgWord32 *) x, __ATOMIC_SEQ_CST);
> #else
>   return __sync_add_and_fetch((StgWord32 *) x, 0);
> #endif
> }
> 
> no warning is emitted as well as no call.
> 
> How does clang infer alignment in these cases? What's so special about
> StgWord64?

StgWord64 is uint64_t which is unsigned long long which is 4 byte
aligned on i386.  Clang wants 8 byte alignment to use the fildll
instruction.

You could change the definition of the StgWord64 type to look like:

typedef uint64_t StgWord64 __attribute__((aligned(8)));

But this only works if all calls to hs_atomicread64 pass a StgWord64
as argument and not some other 64 bit value.

Another solution I already mentioned in a previous message: replace
HAVE_C11_ATOMICS with 0 in hs_atomicread64 so it uses
__sync_add_and_fetch instead of __atomic_load_n.  That uses the
cmpxchg8b instruction which doesn't care about alignment.  It's much
slower but I guess 64 bit atomic loads are rare enough that this
doesn't matter much.