[Bug 274927] Toolchain fails on the __sync_val_compare_and_swap function without -march=native (port biology/seqwish)
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Mon, 06 Nov 2023 12:41:25 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=274927 --- Comment #8 from Dimitry Andric <dim@FreeBSD.org> --- These are all called via seqwish::DisjointSets::unite() (which is in https://github.com/ekg/seqwish/blob/master/src/dset64-gccAtomic.hpp): 0000000000000000 <seqwish::DisjointSets::unite(unsigned long, unsigned long)>: 0: 55 push %rbp 1: 48 89 e5 mov %rsp,%rbp ... 43: e8 00 00 00 00 call 48 <seqwish::DisjointSets::unite(unsigned long, unsigned long)+0x48> 44: R_X86_64_PLT32 __sync_val_compare_and_swap_16-0x4 The file has a comment about this: * The implementation in shasta/src/dset64.hpp uses std::atomic<__uint128_t> * for lock-free synchronization. * On older GCC versions, std::atomic<__uint128_t> is lock-free * if compilation is done with -mcx16, which enables the use of the * 16-byte (128 bit) compare-and-swap instruction, CMPXCHG16B. * * Unfortunately, on newer GCC versions, this is no longer true * because of gcc bug 80878: * https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80878 * * As a result, there was a significant performance loss in * versions of Shasta built with gcc 7, * which is used by default on Ubuntu 18.04, when using * machines with large number of virtual processors. * * It is unlikely that this gcc bug will ever be fixed, * and to avoid this performance loss this implementation * uses gcc primitive __sync_bool_compare_and_swap instead * for lock-free synchronization. When compilation * is done with -mcx16 and optimization turned on, * this primitive uses the CMPXCHG16B instruction * and results in optimal speed. * * The CMPXCHG16B instruction is available on most modern 64-bit x86 processors. * Some older processors that don't implement this instruction * will crash with an "Illegal instruction" error * upon attempting to run this code. However __sync_bool_compare_and_swap is usually provided by a compiler library such as libgcc or libcompiler-rt. I don't think we have this function for 128 bit integers, though. As noted in the comment, the code should be compiled with -mxc16 for optimal performance. Processors which do not support CMPXCHG16B are quite ancient now. -- You are receiving this mail because: You are the assignee for the bug.