[Bug 274927] Toolchain fails on the __sync_val_compare_and_swap function without -march=native (port biology/seqwish)

Go to: [ bottom of page ] [ top of archives ] [ this month ]

From: <bugzilla-noreply_at_freebsd.org>
Date: Mon, 06 Nov 2023 12:41:25 UTC

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=274927

--- Comment #8 from Dimitry Andric <dim@FreeBSD.org> ---
These are all called via seqwish::DisjointSets::unite() (which is in 
https://github.com/ekg/seqwish/blob/master/src/dset64-gccAtomic.hpp):

0000000000000000 <seqwish::DisjointSets::unite(unsigned long, unsigned long)>:
   0:   55                      push   %rbp
   1:   48 89 e5                mov    %rsp,%rbp
...
  43:   e8 00 00 00 00          call   48
<seqwish::DisjointSets::unite(unsigned long, unsigned long)+0x48>
                        44: R_X86_64_PLT32     
__sync_val_compare_and_swap_16-0x4

The file has a comment about this:

 * The implementation in shasta/src/dset64.hpp uses std::atomic<__uint128_t>
 * for lock-free synchronization.
 * On older GCC versions, std::atomic<__uint128_t> is lock-free
 * if compilation is done with -mcx16, which enables the use of the
 * 16-byte (128 bit) compare-and-swap instruction, CMPXCHG16B.
 *
 * Unfortunately, on newer GCC versions, this is no longer true
 * because of gcc bug 80878:
 * https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80878
 *
 * As a result, there was a significant performance loss in
 * versions of Shasta built with gcc 7,
 * which is used by default on Ubuntu 18.04, when using
 * machines with large number of virtual processors.
 *
 * It is unlikely that this gcc bug will ever be fixed,
 * and to avoid this performance loss this implementation
 * uses gcc primitive __sync_bool_compare_and_swap instead
 * for lock-free synchronization. When compilation
 * is done with -mcx16 and optimization turned on,
 * this primitive uses the CMPXCHG16B instruction
 * and results in optimal speed.
 *
 * The CMPXCHG16B instruction is available on most modern 64-bit x86
processors.
 * Some older processors that don't implement this instruction
 * will crash with an "Illegal instruction" error
 * upon attempting to run this code.

However __sync_bool_compare_and_swap is usually provided by a compiler library
such as libgcc or libcompiler-rt. I don't think we have this function for 128
bit integers, though.

As noted in the comment, the code should be compiled with -mxc16 for optimal
performance. Processors which do not support CMPXCHG16B are quite ancient now.

-- 
You are receiving this mail because:
You are the assignee for the bug.