svn commit: r290014 - in stable/10: lib/libthr/arch/amd64 lib/libthr/arch/i386 libexec/rtld-elf/amd64 libexec/rtld-elf/i386 share/mk

Eric van Gyzen vangyzen at FreeBSD.org
Mon Oct 26 16:21:58 UTC 2015


Author: vangyzen
Date: Mon Oct 26 16:21:56 2015
New Revision: 290014
URL: https://svnweb.freebsd.org/changeset/base/290014

Log:
  Disable SSE in libthr
  
  Clang emits SSE instructions on amd64 in the common path of
  pthread_mutex_unlock.  If the thread does not otherwise use SSE,
  this usage incurs a context-switch of the FPU/SSE state, which
  reduces the performance of multiple real-world applications by a
  non-trivial amount (3-5% in one application).
  
  Instead of this change, I experimented with eagerly switching the
  FPU state at context-switch time.  This did not help.  Most of the
  cost seems to be in the read/write of memory--as kib@ stated--and
  not in the #NM handling.  I tested on machines with and without
  XSAVEOPT.
  
  One counter-argument to this change is that most applications already
  use SIMD, and the number of applications and amount of SIMD usage
  are only increasing.  This is absolutely true.  I agree that--in
  general and in principle--this change is in the wrong direction.
  However, there are applications that do not use enough SSE to offset
  the extra context-switch cost.  SSE does not provide a clear benefit
  in the current libthr code with the current compiler, but it does
  provide a clear loss in some cases.  Therefore, disabling SSE in
  libthr is a non-loss for most, and a gain for some.
  
  I refrained from disabling SSE in libc--as was suggested--because
  I can't make the above argument for libc.  It provides a wide variety
  of code; each case should be analyzed separately.
  
  https://lists.freebsd.org/pipermail/freebsd-current/2015-March/055193.html
  
  Suggestions from:   dim, jmg, rpaulo
  Sponsored by:   Dell Inc.

Modified:
  stable/10/lib/libthr/arch/amd64/Makefile.inc
  stable/10/lib/libthr/arch/i386/Makefile.inc
  stable/10/libexec/rtld-elf/amd64/Makefile.inc
  stable/10/libexec/rtld-elf/i386/Makefile.inc
  stable/10/share/mk/bsd.cpu.mk
Directory Properties:
  stable/10/   (props changed)

Modified: stable/10/lib/libthr/arch/amd64/Makefile.inc
==============================================================================
--- stable/10/lib/libthr/arch/amd64/Makefile.inc	Mon Oct 26 15:50:39 2015	(r290013)
+++ stable/10/lib/libthr/arch/amd64/Makefile.inc	Mon Oct 26 16:21:56 2015	(r290014)
@@ -1,3 +1,9 @@
 #$FreeBSD$
 
 SRCS+=	pthread_md.c _umtx_op_err.S
+
+# With the current compiler and libthr code, using SSE in libthr
+# does not provide enough performance improvement to outweigh
+# the extra context switch cost.  This can measurably impact
+# performance when the application also does not use enough SSE.
+CFLAGS+=${CFLAGS_NO_SIMD}

Modified: stable/10/lib/libthr/arch/i386/Makefile.inc
==============================================================================
--- stable/10/lib/libthr/arch/i386/Makefile.inc	Mon Oct 26 15:50:39 2015	(r290013)
+++ stable/10/lib/libthr/arch/i386/Makefile.inc	Mon Oct 26 16:21:56 2015	(r290014)
@@ -1,3 +1,9 @@
 # $FreeBSD$
 
 SRCS+=	pthread_md.c _umtx_op_err.S
+
+# With the current compiler and libthr code, using SSE in libthr
+# does not provide enough performance improvement to outweigh
+# the extra context switch cost.  This can measurably impact
+# performance when the application also does not use enough SSE.
+CFLAGS+=${CFLAGS_NO_SIMD}

Modified: stable/10/libexec/rtld-elf/amd64/Makefile.inc
==============================================================================
--- stable/10/libexec/rtld-elf/amd64/Makefile.inc	Mon Oct 26 15:50:39 2015	(r290013)
+++ stable/10/libexec/rtld-elf/amd64/Makefile.inc	Mon Oct 26 16:21:56 2015	(r290014)
@@ -1,6 +1,6 @@
 # $FreeBSD$
 
-CFLAGS+=	-mno-mmx -mno-3dnow -mno-sse -mno-sse2 -mno-sse3 -msoft-float
+CFLAGS+=	${CFLAGS_NO_SIMD} -msoft-float
 # Uncomment this to build the dynamic linker as an executable instead
 # of a shared library:
 #LDSCRIPT=	${.CURDIR}/${MACHINE_CPUARCH}/elf_rtld.x

Modified: stable/10/libexec/rtld-elf/i386/Makefile.inc
==============================================================================
--- stable/10/libexec/rtld-elf/i386/Makefile.inc	Mon Oct 26 15:50:39 2015	(r290013)
+++ stable/10/libexec/rtld-elf/i386/Makefile.inc	Mon Oct 26 16:21:56 2015	(r290014)
@@ -1,6 +1,6 @@
 # $FreeBSD$
 
-CFLAGS+=	-mno-mmx -mno-3dnow -mno-sse -mno-sse2 -mno-sse3 -msoft-float
+CFLAGS+=	${CFLAGS_NO_SIMD} -msoft-float
 # Uncomment this to build the dynamic linker as an executable instead
 # of a shared library:
 #LDSCRIPT=	${.CURDIR}/${MACHINE_CPUARCH}/elf_rtld.x

Modified: stable/10/share/mk/bsd.cpu.mk
==============================================================================
--- stable/10/share/mk/bsd.cpu.mk	Mon Oct 26 15:50:39 2015	(r290013)
+++ stable/10/share/mk/bsd.cpu.mk	Mon Oct 26 16:21:56 2015	(r290014)
@@ -267,6 +267,27 @@ _CPUCFLAGS += -mfloat-abi=softfp
 CFLAGS += ${_CPUCFLAGS}
 .endif
 
+#
+# Prohibit the compiler from emitting SIMD instructions.
+# These flags are added to CFLAGS in areas where the extra context-switch
+# cost outweighs the advantages of SIMD instructions.
+#
+# gcc:
+# Setting -mno-mmx implies -mno-3dnow
+# Setting -mno-sse implies -mno-sse2, -mno-sse3, -mno-ssse3 and -mfpmath=387
+#
+# clang:
+# Setting -mno-mmx implies -mno-3dnow and -mno-3dnowa
+# Setting -mno-sse implies -mno-sse2, -mno-sse3, -mno-ssse3, -mno-sse41 and
+# -mno-sse42
+# (-mfpmath= is not supported)
+#
+.if ${MACHINE_CPUARCH} == "i386" || ${MACHINE_CPUARCH} == "amd64"
+CFLAGS_NO_SIMD.clang= -mno-avx
+CFLAGS_NO_SIMD= -mno-mmx -mno-sse
+.endif
+CFLAGS_NO_SIMD += ${CFLAGS_NO_SIMD.${COMPILER_TYPE}}
+
 # Add in any architecture-specific CFLAGS.  
 # These come from make.conf or the command line or the environment.
 CFLAGS += ${CFLAGS.${MACHINE_ARCH}}


More information about the svn-src-stable-10 mailing list