svn commit: r467849 - in head/devel/llvm60: . files files/clang

Fri Apr 20 22:46:23 UTC 2018

Author: brooks
Date: Fri Apr 20 22:46:22 2018
New Revision: 467849
URL: https://svnweb.freebsd.org/changeset/ports/467849

Log:
  Merge r332833 from FreeBSD HEAD.
  
  This should ensure clang does not use pushf/popf sequences to
  save and restore flags, avoiding problems with unrelated flags (such as
  the interrupt flag) being restored unexpectedly.
  
  PR:		225330

Added:
  head/devel/llvm60/files/clang/patch-fsvn-r332833-clang   (contents, props changed)
  head/devel/llvm60/files/patch-fsvn-r332833   (contents, props changed)
Modified:
  head/devel/llvm60/Makefile

Modified: head/devel/llvm60/Makefile
==============================================================================

--- head/devel/llvm60/Makefile	Fri Apr 20 21:50:41 2018	(r467848)
+++ head/devel/llvm60/Makefile	Fri Apr 20 22:46:22 2018	(r467849)
@@ -2,7 +2,7 @@
 
 PORTNAME=	llvm
 DISTVERSION=	6.0.0
-PORTREVISION=	1
+PORTREVISION=	2
 CATEGORIES=	devel lang
 MASTER_SITES=	http://${PRE_}releases.llvm.org/${LLVM_RELEASE}/${RCDIR}
 PKGNAMESUFFIX=	${LLVM_SUFFIX}

Added: head/devel/llvm60/files/clang/patch-fsvn-r332833-clang
==============================================================================
--- /dev/null	00:00:00 1970	(empty, because file is newly added)
+++ head/devel/llvm60/files/clang/patch-fsvn-r332833-clang	Fri Apr 20 22:46:22 2018	(r467849)
@@ -0,0 +1,258 @@
+commit f13397cb22ae77e9b18e29273e2920bd63c17ef1
+Author: dim <dim at FreeBSD.org>
+Date:   Fri Apr 20 18:20:55 2018 +0000
+
+    Recommit r332501, with an additional upstream fix for "Cannot lower
+    EFLAGS copy that lives out of a basic block!" errors on i386.
+    
+    Pull in r325446 from upstream clang trunk (by me):
+    
+      [X86] Add 'sahf' CPU feature to frontend
+    
+      Summary:
+      Make clang accept `-msahf` (and `-mno-sahf`) flags to activate the
+      `+sahf` feature for the backend, for bug 36028 (Incorrect use of
+      pushf/popf enables/disables interrupts on amd64 kernels).  This was
+      originally submitted in bug 36037 by Jonathan Looney
+      <jonlooney at gmail.com>.
+    
+      As described there, GCC also uses `-msahf` for this feature, and the
+      backend already recognizes the `+sahf` feature. All that is needed is
+      to teach clang to pass this on to the backend.
+    
+      The mapping of feature support onto CPUs may not be complete; rather,
+      it was chosen to match LLVM's idea of which CPUs support this feature
+      (see lib/Target/X86/X86.td).
+    
+      I also updated the affected test case (CodeGen/attr-target-x86.c) to
+      match the emitted output.
+    
+      Reviewers: craig.topper, coby, efriedma, rsmith
+    
+      Reviewed By: craig.topper
+    
+      Subscribers: emaste, cfe-commits
+    
+      Differential Revision: https://reviews.llvm.org/D43394
+    
+    Pull in r328944 from upstream llvm trunk (by Chandler Carruth):
+    
+      [x86] Expose more of the condition conversion routines in the public
+      API for X86's instruction information. I've now got a second patch
+      under review that needs these same APIs. This bit is nicely
+      orthogonal and obvious, so landing it. NFC.
+    
+    Pull in r329414 from upstream llvm trunk (by Craig Topper):
+    
+      [X86] Merge itineraries for CLC, CMC, and STC.
+    
+      These are very simple flag setting instructions that appear to only
+      be a single uop. They're unlikely to need this separation.
+    
+    Pull in r329657 from upstream llvm trunk (by Chandler Carruth):
+    
+      [x86] Introduce a pass to begin more systematically fixing PR36028
+      and similar issues.
+    
+      The key idea is to lower COPY nodes populating EFLAGS by scanning the
+      uses of EFLAGS and introducing dedicated code to preserve the
+      necessary state in a GPR. In the vast majority of cases, these uses
+      are cmovCC and jCC instructions. For such cases, we can very easily
+      save and restore the necessary information by simply inserting a
+      setCC into a GPR where the original flags are live, and then testing
+      that GPR directly to feed the cmov or conditional branch.
+    
+      However, things are a bit more tricky if arithmetic is using the
+      flags.  This patch handles the vast majority of cases that seem to
+      come up in practice: adc, adcx, adox, rcl, and rcr; all without
+      taking advantage of partially preserved EFLAGS as LLVM doesn't
+      currently model that at all.
+    
+      There are a large number of operations that techinaclly observe
+      EFLAGS currently but shouldn't in this case -- they typically are
+      using DF.  Currently, they will not be handled by this approach.
+      However, I have never seen this issue come up in practice. It is
+      already pretty rare to have these patterns come up in practical code
+      with LLVM. I had to resort to writing MIR tests to cover most of the
+      logic in this pass already.  I suspect even with its current amount
+      of coverage of arithmetic users of EFLAGS it will be a significant
+      improvement over the current use of pushf/popf. It will also produce
+      substantially faster code in most of the common patterns.
+    
+      This patch also removes all of the old lowering for EFLAGS copies,
+      and the hack that forced us to use a frame pointer when EFLAGS copies
+      were found anywhere in a function so that the dynamic stack
+      adjustment wasn't a problem. None of this is needed as we now lower
+      all of these copies directly in MI and without require stack
+      adjustments.
+    
+      Lots of thanks to Reid who came up with several aspects of this
+      approach, and Craig who helped me work out a couple of things
+      tripping me up while working on this.
+    
+      Differential Revision: https://reviews.llvm.org/D45146
+    
+    Pull in r329673 from upstream llvm trunk (by Chandler Carruth):
+    
+      [x86] Model the direction flag (DF) separately from the rest of
+      EFLAGS.
+    
+      This cleans up a number of operations that only claimed te use EFLAGS
+      due to using DF. But no instructions which we think of us setting
+      EFLAGS actually modify DF (other than things like popf) and so this
+      needlessly creates uses of EFLAGS that aren't really there.
+    
+      In fact, DF is so restrictive it is pretty easy to model. Only STD,
+      CLD, and the whole-flags writes (WRFLAGS and POPF) need to model
+      this.
+    
+      I've also somewhat cleaned up some of the flag management instruction
+      definitions to be in the correct .td file.
+    
+      Adding this extra register also uncovered a failure to use the
+      correct datatype to hold X86 registers, and I've corrected that as
+      necessary here.
+    
+      Differential Revision: https://reviews.llvm.org/D45154
+    
+    Pull in r330264 from upstream llvm trunk (by Chandler Carruth):
+    
+      [x86] Fix PR37100 by teaching the EFLAGS copy lowering to rewrite
+      uses across basic blocks in the limited cases where it is very
+      straight forward to do so.
+    
+      This will also be useful for other places where we do some limited
+      EFLAGS propagation across CFG edges and need to handle copy rewrites
+      afterward. I think this is rapidly approaching the maximum we can and
+      should be doing here. Everything else begins to require either heroic
+      analysis to prove how to do PHI insertion manually, or somehow
+      managing arbitrary PHI-ing of EFLAGS with general PHI insertion.
+      Neither of these seem at all promising so if those cases come up,
+      we'll almost certainly need to rewrite the parts of LLVM that produce
+      those patterns.
+    
+      We do now require dominator trees in order to reliably diagnose
+      patterns that would require PHI nodes. This is a bit unfortunate but
+      it seems better than the completely mysterious crash we would get
+      otherwise.
+    
+      Differential Revision: https://reviews.llvm.org/D45673
+    
+    Together, these should ensure clang does not use pushf/popf sequences to
+    save and restore flags, avoiding problems with unrelated flags (such as
+    the interrupt flag) being restored unexpectedly.
+    
+    Requested by:   jtl
+    PR:             225330
+    MFC after:      1 week
+
+diff --git llvm/tools/clang/include/clang/Driver/Options.td llvm/tools/clang/include/clang/Driver/Options.td
+index ad72aef3fc9..cab450042e6 100644
+--- tools/clang/include/clang/Driver/Options.td
++++ tools/clang/include/clang/Driver/Options.td
+@@ -2559,6 +2559,8 @@ def mrtm : Flag<["-"], "mrtm">, Group<m_x86_Features_Group>;
+ def mno_rtm : Flag<["-"], "mno-rtm">, Group<m_x86_Features_Group>;
+ def mrdseed : Flag<["-"], "mrdseed">, Group<m_x86_Features_Group>;
+ def mno_rdseed : Flag<["-"], "mno-rdseed">, Group<m_x86_Features_Group>;
++def msahf : Flag<["-"], "msahf">, Group<m_x86_Features_Group>;
++def mno_sahf : Flag<["-"], "mno-sahf">, Group<m_x86_Features_Group>;
+ def msgx : Flag<["-"], "msgx">, Group<m_x86_Features_Group>;
+ def mno_sgx : Flag<["-"], "mno-sgx">, Group<m_x86_Features_Group>;
+ def msha : Flag<["-"], "msha">, Group<m_x86_Features_Group>;
+diff --git llvm/tools/clang/lib/Basic/Targets/X86.cpp llvm/tools/clang/lib/Basic/Targets/X86.cpp
+index cfa6c571d6e..8251e6abd64 100644
+--- tools/clang/lib/Basic/Targets/X86.cpp
++++ tools/clang/lib/Basic/Targets/X86.cpp
+@@ -198,6 +198,7 @@ bool X86TargetInfo::initFeatureMap(
+     LLVM_FALLTHROUGH;
+   case CK_Core2:
+     setFeatureEnabledImpl(Features, "ssse3", true);
++    setFeatureEnabledImpl(Features, "sahf", true);
+     LLVM_FALLTHROUGH;
+   case CK_Yonah:
+   case CK_Prescott:
+@@ -239,6 +240,7 @@ bool X86TargetInfo::initFeatureMap(
+     setFeatureEnabledImpl(Features, "ssse3", true);
+     setFeatureEnabledImpl(Features, "fxsr", true);
+     setFeatureEnabledImpl(Features, "cx16", true);
++    setFeatureEnabledImpl(Features, "sahf", true);
+     break;
+ 
+   case CK_KNM:
+@@ -269,6 +271,7 @@ bool X86TargetInfo::initFeatureMap(
+     setFeatureEnabledImpl(Features, "xsaveopt", true);
+     setFeatureEnabledImpl(Features, "xsave", true);
+     setFeatureEnabledImpl(Features, "movbe", true);
++    setFeatureEnabledImpl(Features, "sahf", true);
+     break;
+ 
+   case CK_K6_2:
+@@ -282,6 +285,7 @@ bool X86TargetInfo::initFeatureMap(
+     setFeatureEnabledImpl(Features, "sse4a", true);
+     setFeatureEnabledImpl(Features, "lzcnt", true);
+     setFeatureEnabledImpl(Features, "popcnt", true);
++    setFeatureEnabledImpl(Features, "sahf", true);
+     LLVM_FALLTHROUGH;
+   case CK_K8SSE3:
+     setFeatureEnabledImpl(Features, "sse3", true);
+@@ -315,6 +319,7 @@ bool X86TargetInfo::initFeatureMap(
+     setFeatureEnabledImpl(Features, "prfchw", true);
+     setFeatureEnabledImpl(Features, "cx16", true);
+     setFeatureEnabledImpl(Features, "fxsr", true);
++    setFeatureEnabledImpl(Features, "sahf", true);
+     break;
+ 
+   case CK_ZNVER1:
+@@ -338,6 +343,7 @@ bool X86TargetInfo::initFeatureMap(
+     setFeatureEnabledImpl(Features, "prfchw", true);
+     setFeatureEnabledImpl(Features, "rdrnd", true);
+     setFeatureEnabledImpl(Features, "rdseed", true);
++    setFeatureEnabledImpl(Features, "sahf", true);
+     setFeatureEnabledImpl(Features, "sha", true);
+     setFeatureEnabledImpl(Features, "sse4a", true);
+     setFeatureEnabledImpl(Features, "xsave", true);
+@@ -372,6 +378,7 @@ bool X86TargetInfo::initFeatureMap(
+     setFeatureEnabledImpl(Features, "cx16", true);
+     setFeatureEnabledImpl(Features, "fxsr", true);
+     setFeatureEnabledImpl(Features, "xsave", true);
++    setFeatureEnabledImpl(Features, "sahf", true);
+     break;
+   }
+   if (!TargetInfo::initFeatureMap(Features, Diags, CPU, FeaturesVec))
+@@ -768,6 +775,8 @@ bool X86TargetInfo::handleTargetFeatures(std::vector<std::string> &Features,
+       HasRetpoline = true;
+     } else if (Feature == "+retpoline-external-thunk") {
+       HasRetpolineExternalThunk = true;
++    } else if (Feature == "+sahf") {
++      HasLAHFSAHF = true;
+     }
+ 
+     X86SSEEnum Level = llvm::StringSwitch<X86SSEEnum>(Feature)
+@@ -1240,6 +1249,7 @@ bool X86TargetInfo::isValidFeatureName(StringRef Name) const {
+       .Case("rdrnd", true)
+       .Case("rdseed", true)
+       .Case("rtm", true)
++      .Case("sahf", true)
+       .Case("sgx", true)
+       .Case("sha", true)
+       .Case("shstk", true)
+@@ -1313,6 +1323,7 @@ bool X86TargetInfo::hasFeature(StringRef Feature) const {
+       .Case("retpoline", HasRetpoline)
+       .Case("retpoline-external-thunk", HasRetpolineExternalThunk)
+       .Case("rtm", HasRTM)
++      .Case("sahf", HasLAHFSAHF)
+       .Case("sgx", HasSGX)
+       .Case("sha", HasSHA)
+       .Case("shstk", HasSHSTK)
+diff --git llvm/tools/clang/lib/Basic/Targets/X86.h llvm/tools/clang/lib/Basic/Targets/X86.h
+index 590531c1785..fa2fbee387b 100644
+--- tools/clang/lib/Basic/Targets/X86.h
++++ tools/clang/lib/Basic/Targets/X86.h
+@@ -98,6 +98,7 @@ class LLVM_LIBRARY_VISIBILITY X86TargetInfo : public TargetInfo {
+   bool HasPREFETCHWT1 = false;
+   bool HasRetpoline = false;
+   bool HasRetpolineExternalThunk = false;
++  bool HasLAHFSAHF = false;
+ 
+   /// \brief Enumeration of all of the X86 CPUs supported by Clang.
+   ///

Added: head/devel/llvm60/files/patch-fsvn-r332833
==============================================================================
--- /dev/null	00:00:00 1970	(empty, because file is newly added)
+++ head/devel/llvm60/files/patch-fsvn-r332833	Fri Apr 20 22:46:22 2018	(r467849)
@@ -0,0 +1,1623 @@
+commit f13397cb22ae77e9b18e29273e2920bd63c17ef1
+Author: dim <dim at FreeBSD.org>
+Date:   Fri Apr 20 18:20:55 2018 +0000
+
+    Recommit r332501, with an additional upstream fix for "Cannot lower
+    EFLAGS copy that lives out of a basic block!" errors on i386.
+    
+    Pull in r325446 from upstream clang trunk (by me):
+    
+      [X86] Add 'sahf' CPU feature to frontend
+    
+      Summary:
+      Make clang accept `-msahf` (and `-mno-sahf`) flags to activate the
+      `+sahf` feature for the backend, for bug 36028 (Incorrect use of
+      pushf/popf enables/disables interrupts on amd64 kernels).  This was
+      originally submitted in bug 36037 by Jonathan Looney
+      <jonlooney at gmail.com>.
+    
+      As described there, GCC also uses `-msahf` for this feature, and the
+      backend already recognizes the `+sahf` feature. All that is needed is
+      to teach clang to pass this on to the backend.
+    
+      The mapping of feature support onto CPUs may not be complete; rather,
+      it was chosen to match LLVM's idea of which CPUs support this feature
+      (see lib/Target/X86/X86.td).
+    
+      I also updated the affected test case (CodeGen/attr-target-x86.c) to
+      match the emitted output.
+    
+      Reviewers: craig.topper, coby, efriedma, rsmith
+    
+      Reviewed By: craig.topper
+    
+      Subscribers: emaste, cfe-commits
+    
+      Differential Revision: https://reviews.llvm.org/D43394
+    
+    Pull in r328944 from upstream llvm trunk (by Chandler Carruth):
+    
+      [x86] Expose more of the condition conversion routines in the public
+      API for X86's instruction information. I've now got a second patch
+      under review that needs these same APIs. This bit is nicely
+      orthogonal and obvious, so landing it. NFC.
+    
+    Pull in r329414 from upstream llvm trunk (by Craig Topper):
+    
+      [X86] Merge itineraries for CLC, CMC, and STC.
+    
+      These are very simple flag setting instructions that appear to only
+      be a single uop. They're unlikely to need this separation.
+    
+    Pull in r329657 from upstream llvm trunk (by Chandler Carruth):
+    
+      [x86] Introduce a pass to begin more systematically fixing PR36028
+      and similar issues.
+    
+      The key idea is to lower COPY nodes populating EFLAGS by scanning the
+      uses of EFLAGS and introducing dedicated code to preserve the
+      necessary state in a GPR. In the vast majority of cases, these uses
+      are cmovCC and jCC instructions. For such cases, we can very easily
+      save and restore the necessary information by simply inserting a
+      setCC into a GPR where the original flags are live, and then testing
+      that GPR directly to feed the cmov or conditional branch.
+    
+      However, things are a bit more tricky if arithmetic is using the
+      flags.  This patch handles the vast majority of cases that seem to
+      come up in practice: adc, adcx, adox, rcl, and rcr; all without
+      taking advantage of partially preserved EFLAGS as LLVM doesn't
+      currently model that at all.
+    
+      There are a large number of operations that techinaclly observe
+      EFLAGS currently but shouldn't in this case -- they typically are
+      using DF.  Currently, they will not be handled by this approach.
+      However, I have never seen this issue come up in practice. It is
+      already pretty rare to have these patterns come up in practical code
+      with LLVM. I had to resort to writing MIR tests to cover most of the
+      logic in this pass already.  I suspect even with its current amount
+      of coverage of arithmetic users of EFLAGS it will be a significant
+      improvement over the current use of pushf/popf. It will also produce
+      substantially faster code in most of the common patterns.
+    
+      This patch also removes all of the old lowering for EFLAGS copies,
+      and the hack that forced us to use a frame pointer when EFLAGS copies
+      were found anywhere in a function so that the dynamic stack
+      adjustment wasn't a problem. None of this is needed as we now lower
+      all of these copies directly in MI and without require stack
+      adjustments.
+    
+      Lots of thanks to Reid who came up with several aspects of this
+      approach, and Craig who helped me work out a couple of things
+      tripping me up while working on this.
+    
+      Differential Revision: https://reviews.llvm.org/D45146
+    
+    Pull in r329673 from upstream llvm trunk (by Chandler Carruth):
+    
+      [x86] Model the direction flag (DF) separately from the rest of
+      EFLAGS.
+    
+      This cleans up a number of operations that only claimed te use EFLAGS
+      due to using DF. But no instructions which we think of us setting
+      EFLAGS actually modify DF (other than things like popf) and so this
+      needlessly creates uses of EFLAGS that aren't really there.
+    
+      In fact, DF is so restrictive it is pretty easy to model. Only STD,
+      CLD, and the whole-flags writes (WRFLAGS and POPF) need to model
+      this.
+    
+      I've also somewhat cleaned up some of the flag management instruction
+      definitions to be in the correct .td file.
+    
+      Adding this extra register also uncovered a failure to use the
+      correct datatype to hold X86 registers, and I've corrected that as
+      necessary here.
+    
+      Differential Revision: https://reviews.llvm.org/D45154
+    
+    Pull in r330264 from upstream llvm trunk (by Chandler Carruth):
+    
+      [x86] Fix PR37100 by teaching the EFLAGS copy lowering to rewrite
+      uses across basic blocks in the limited cases where it is very
+      straight forward to do so.
+    
+      This will also be useful for other places where we do some limited
+      EFLAGS propagation across CFG edges and need to handle copy rewrites
+      afterward. I think this is rapidly approaching the maximum we can and
+      should be doing here. Everything else begins to require either heroic
+      analysis to prove how to do PHI insertion manually, or somehow
+      managing arbitrary PHI-ing of EFLAGS with general PHI insertion.
+      Neither of these seem at all promising so if those cases come up,
+      we'll almost certainly need to rewrite the parts of LLVM that produce
+      those patterns.
+    
+      We do now require dominator trees in order to reliably diagnose
+      patterns that would require PHI nodes. This is a bit unfortunate but
+      it seems better than the completely mysterious crash we would get
+      otherwise.
+    
+      Differential Revision: https://reviews.llvm.org/D45673
+    
+    Together, these should ensure clang does not use pushf/popf sequences to
+    save and restore flags, avoiding problems with unrelated flags (such as
+    the interrupt flag) being restored unexpectedly.
+    
+    Requested by:   jtl
+    PR:             225330
+    MFC after:      1 week
+
+diff --git llvm/include/llvm/CodeGen/MachineBasicBlock.h b/contrib/llvm/include/llvm/CodeGen/MachineBasicBlock.h
+index 0c9110cbaa8..89210e16629 100644
+--- include/llvm/CodeGen/MachineBasicBlock.h
++++ include/llvm/CodeGen/MachineBasicBlock.h
+@@ -449,6 +449,13 @@ class MachineBasicBlock
+   /// Replace successor OLD with NEW and update probability info.
+   void replaceSuccessor(MachineBasicBlock *Old, MachineBasicBlock *New);
+ 
++  /// Copy a successor (and any probability info) from original block to this
++  /// block's. Uses an iterator into the original blocks successors.
++  ///
++  /// This is useful when doing a partial clone of successors. Afterward, the
++  /// probabilities may need to be normalized.
++  void copySuccessor(MachineBasicBlock *Orig, succ_iterator I);
++
+   /// Transfers all the successors from MBB to this machine basic block (i.e.,
+   /// copies all the successors FromMBB and remove all the successors from
+   /// FromMBB).
+diff --git llvm/lib/CodeGen/MachineBasicBlock.cpp b/contrib/llvm/lib/CodeGen/MachineBasicBlock.cpp
+index 209abf34d88..cd67449e3ac 100644
+--- lib/CodeGen/MachineBasicBlock.cpp
++++ lib/CodeGen/MachineBasicBlock.cpp
+@@ -646,6 +646,14 @@ void MachineBasicBlock::replaceSuccessor(MachineBasicBlock *Old,
+   removeSuccessor(OldI);
+ }
+ 
++void MachineBasicBlock::copySuccessor(MachineBasicBlock *Orig,
++                                      succ_iterator I) {
++  if (Orig->Probs.empty())
++    addSuccessor(*I, Orig->getSuccProbability(I));
++  else
++    addSuccessorWithoutProb(*I);
++}
++
+ void MachineBasicBlock::addPredecessor(MachineBasicBlock *Pred) {
+   Predecessors.push_back(Pred);
+ }
+diff --git llvm/lib/Target/X86/Disassembler/X86Disassembler.cpp b/contrib/llvm/lib/Target/X86/Disassembler/X86Disassembler.cpp
+index c58254ae38c..b3c491b3de5 100644
+--- lib/Target/X86/Disassembler/X86Disassembler.cpp
++++ lib/Target/X86/Disassembler/X86Disassembler.cpp
+@@ -265,13 +265,10 @@ MCDisassembler::DecodeStatus X86GenericDisassembler::getInstruction(
+ /// @param reg        - The Reg to append.
+ static void translateRegister(MCInst &mcInst, Reg reg) {
+ #define ENTRY(x) X86::x,
+-  uint8_t llvmRegnums[] = {
+-    ALL_REGS
+-    0
+-  };
++  static constexpr MCPhysReg llvmRegnums[] = {ALL_REGS};
+ #undef ENTRY
+ 
+-  uint8_t llvmRegnum = llvmRegnums[reg];
++  MCPhysReg llvmRegnum = llvmRegnums[reg];
+   mcInst.addOperand(MCOperand::createReg(llvmRegnum));
+ }
+ 
+diff --git llvm/lib/Target/X86/X86.h b/contrib/llvm/lib/Target/X86/X86.h
+index 36132682429..642dda8f422 100644
+--- lib/Target/X86/X86.h
++++ lib/Target/X86/X86.h
+@@ -66,6 +66,9 @@ FunctionPass *createX86OptimizeLEAs();
+ /// Return a pass that transforms setcc + movzx pairs into xor + setcc.
+ FunctionPass *createX86FixupSetCC();
+ 
++/// Return a pass that lowers EFLAGS copy pseudo instructions.
++FunctionPass *createX86FlagsCopyLoweringPass();
++
+ /// Return a pass that expands WinAlloca pseudo-instructions.
+ FunctionPass *createX86WinAllocaExpander();
+ 
+diff --git llvm/lib/Target/X86/X86FlagsCopyLowering.cpp b/contrib/llvm/lib/Target/X86/X86FlagsCopyLowering.cpp
+new file mode 100644
+index 00000000000..1b6369b7bfd
+--- /dev/null
++++ lib/Target/X86/X86FlagsCopyLowering.cpp
+@@ -0,0 +1,777 @@
++//====- X86FlagsCopyLowering.cpp - Lowers COPY nodes of EFLAGS ------------===//
++//
++//                     The LLVM Compiler Infrastructure
++//
++// This file is distributed under the University of Illinois Open Source
++// License. See LICENSE.TXT for details.
++//
++//===----------------------------------------------------------------------===//
++/// \file
++///
++/// Lowers COPY nodes of EFLAGS by directly extracting and preserving individual
++/// flag bits.
++///
++/// We have to do this by carefully analyzing and rewriting the usage of the
++/// copied EFLAGS register because there is no general way to rematerialize the
++/// entire EFLAGS register safely and efficiently. Using `popf` both forces
++/// dynamic stack adjustment and can create correctness issues due to IF, TF,
++/// and other non-status flags being overwritten. Using sequences involving
++/// SAHF don't work on all x86 processors and are often quite slow compared to
++/// directly testing a single status preserved in its own GPR.
++///
++//===----------------------------------------------------------------------===//
++
++#include "X86.h"
++#include "X86InstrBuilder.h"
++#include "X86InstrInfo.h"
++#include "X86Subtarget.h"
++#include "llvm/ADT/ArrayRef.h"
++#include "llvm/ADT/DenseMap.h"
++#include "llvm/ADT/STLExtras.h"
++#include "llvm/ADT/ScopeExit.h"
++#include "llvm/ADT/SmallPtrSet.h"
++#include "llvm/ADT/SmallSet.h"
++#include "llvm/ADT/SmallVector.h"
++#include "llvm/ADT/SparseBitVector.h"
++#include "llvm/ADT/Statistic.h"
++#include "llvm/CodeGen/MachineBasicBlock.h"
++#include "llvm/CodeGen/MachineConstantPool.h"
++#include "llvm/CodeGen/MachineDominators.h"
++#include "llvm/CodeGen/MachineFunction.h"
++#include "llvm/CodeGen/MachineFunctionPass.h"
++#include "llvm/CodeGen/MachineInstr.h"
++#include "llvm/CodeGen/MachineInstrBuilder.h"
++#include "llvm/CodeGen/MachineModuleInfo.h"
++#include "llvm/CodeGen/MachineOperand.h"
++#include "llvm/CodeGen/MachineRegisterInfo.h"
++#include "llvm/CodeGen/MachineSSAUpdater.h"
++#include "llvm/CodeGen/TargetInstrInfo.h"
++#include "llvm/CodeGen/TargetRegisterInfo.h"
++#include "llvm/CodeGen/TargetSchedule.h"
++#include "llvm/CodeGen/TargetSubtargetInfo.h"
++#include "llvm/IR/DebugLoc.h"
++#include "llvm/MC/MCSchedule.h"
++#include "llvm/Pass.h"
++#include "llvm/Support/CommandLine.h"
++#include "llvm/Support/Debug.h"
++#include "llvm/Support/raw_ostream.h"
++#include <algorithm>
++#include <cassert>
++#include <iterator>
++#include <utility>
++
++using namespace llvm;
++
++#define PASS_KEY "x86-flags-copy-lowering"
++#define DEBUG_TYPE PASS_KEY
++
++STATISTIC(NumCopiesEliminated, "Number of copies of EFLAGS eliminated");
++STATISTIC(NumSetCCsInserted, "Number of setCC instructions inserted");
++STATISTIC(NumTestsInserted, "Number of test instructions inserted");
++STATISTIC(NumAddsInserted, "Number of adds instructions inserted");
++
++namespace llvm {
++
++void initializeX86FlagsCopyLoweringPassPass(PassRegistry &);
++
++} // end namespace llvm
++
++namespace {
++
++// Convenient array type for storing registers associated with each condition.
++using CondRegArray = std::array<unsigned, X86::LAST_VALID_COND + 1>;
++
++class X86FlagsCopyLoweringPass : public MachineFunctionPass {
++public:
++  X86FlagsCopyLoweringPass() : MachineFunctionPass(ID) {
++    initializeX86FlagsCopyLoweringPassPass(*PassRegistry::getPassRegistry());
++  }
++
++  StringRef getPassName() const override { return "X86 EFLAGS copy lowering"; }
++  bool runOnMachineFunction(MachineFunction &MF) override;
++  void getAnalysisUsage(AnalysisUsage &AU) const override;
++
++  /// Pass identification, replacement for typeid.
++  static char ID;
++
++private:
++  MachineRegisterInfo *MRI;
++  const X86InstrInfo *TII;
++  const TargetRegisterInfo *TRI;
++  const TargetRegisterClass *PromoteRC;
++  MachineDominatorTree *MDT;
++
++  CondRegArray collectCondsInRegs(MachineBasicBlock &MBB,
++                                  MachineInstr &CopyDefI);
++
++  unsigned promoteCondToReg(MachineBasicBlock &MBB,
++                            MachineBasicBlock::iterator TestPos,
++                            DebugLoc TestLoc, X86::CondCode Cond);
++  std::pair<unsigned, bool>
++  getCondOrInverseInReg(MachineBasicBlock &TestMBB,
++                        MachineBasicBlock::iterator TestPos, DebugLoc TestLoc,
++                        X86::CondCode Cond, CondRegArray &CondRegs);
++  void insertTest(MachineBasicBlock &MBB, MachineBasicBlock::iterator Pos,
++                  DebugLoc Loc, unsigned Reg);
++
++  void rewriteArithmetic(MachineBasicBlock &TestMBB,
++                         MachineBasicBlock::iterator TestPos, DebugLoc TestLoc,
++                         MachineInstr &MI, MachineOperand &FlagUse,
++                         CondRegArray &CondRegs);
++  void rewriteCMov(MachineBasicBlock &TestMBB,
++                   MachineBasicBlock::iterator TestPos, DebugLoc TestLoc,
++                   MachineInstr &CMovI, MachineOperand &FlagUse,
++                   CondRegArray &CondRegs);
++  void rewriteCondJmp(MachineBasicBlock &TestMBB,
++                      MachineBasicBlock::iterator TestPos, DebugLoc TestLoc,
++                      MachineInstr &JmpI, CondRegArray &CondRegs);
++  void rewriteCopy(MachineInstr &MI, MachineOperand &FlagUse,
++                   MachineInstr &CopyDefI);
++  void rewriteSetCC(MachineBasicBlock &TestMBB,
++                    MachineBasicBlock::iterator TestPos, DebugLoc TestLoc,
++                    MachineInstr &SetCCI, MachineOperand &FlagUse,
++                    CondRegArray &CondRegs);
++};
++
++} // end anonymous namespace
++
++INITIALIZE_PASS_BEGIN(X86FlagsCopyLoweringPass, DEBUG_TYPE,
++                      "X86 EFLAGS copy lowering", false, false)
++INITIALIZE_PASS_END(X86FlagsCopyLoweringPass, DEBUG_TYPE,
++                    "X86 EFLAGS copy lowering", false, false)
++
++FunctionPass *llvm::createX86FlagsCopyLoweringPass() {
++  return new X86FlagsCopyLoweringPass();
++}
++
++char X86FlagsCopyLoweringPass::ID = 0;
++
++void X86FlagsCopyLoweringPass::getAnalysisUsage(AnalysisUsage &AU) const {
++  AU.addRequired<MachineDominatorTree>();
++  MachineFunctionPass::getAnalysisUsage(AU);
++}
++
++namespace {
++/// An enumeration of the arithmetic instruction mnemonics which have
++/// interesting flag semantics.
++///
++/// We can map instruction opcodes into these mnemonics to make it easy to
++/// dispatch with specific functionality.
++enum class FlagArithMnemonic {
++  ADC,
++  ADCX,
++  ADOX,
++  RCL,
++  RCR,
++  SBB,
++};
++} // namespace
++
++static FlagArithMnemonic getMnemonicFromOpcode(unsigned Opcode) {
++  switch (Opcode) {
++  default:
++    report_fatal_error("No support for lowering a copy into EFLAGS when used "
++                       "by this instruction!");
++
++#define LLVM_EXPAND_INSTR_SIZES(MNEMONIC, SUFFIX)                              \
++  case X86::MNEMONIC##8##SUFFIX:                                               \
++  case X86::MNEMONIC##16##SUFFIX:                                              \
++  case X86::MNEMONIC##32##SUFFIX:                                              \
++  case X86::MNEMONIC##64##SUFFIX:
++
++#define LLVM_EXPAND_ADC_SBB_INSTR(MNEMONIC)                                    \
++  LLVM_EXPAND_INSTR_SIZES(MNEMONIC, rr)                                        \
++  LLVM_EXPAND_INSTR_SIZES(MNEMONIC, rr_REV)                                    \
++  LLVM_EXPAND_INSTR_SIZES(MNEMONIC, rm)                                        \
++  LLVM_EXPAND_INSTR_SIZES(MNEMONIC, mr)                                        \
++  case X86::MNEMONIC##8ri:                                                     \
++  case X86::MNEMONIC##16ri8:                                                   \
++  case X86::MNEMONIC##32ri8:                                                   \
++  case X86::MNEMONIC##64ri8:                                                   \
++  case X86::MNEMONIC##16ri:                                                    \
++  case X86::MNEMONIC##32ri:                                                    \
++  case X86::MNEMONIC##64ri32:                                                  \
++  case X86::MNEMONIC##8mi:                                                     \
++  case X86::MNEMONIC##16mi8:                                                   \
++  case X86::MNEMONIC##32mi8:                                                   \
++  case X86::MNEMONIC##64mi8:                                                   \
++  case X86::MNEMONIC##16mi:                                                    \
++  case X86::MNEMONIC##32mi:                                                    \
++  case X86::MNEMONIC##64mi32:                                                  \
++  case X86::MNEMONIC##8i8:                                                     \
++  case X86::MNEMONIC##16i16:                                                   \
++  case X86::MNEMONIC##32i32:                                                   \
++  case X86::MNEMONIC##64i32:
++
++    LLVM_EXPAND_ADC_SBB_INSTR(ADC)
++    return FlagArithMnemonic::ADC;
++
++    LLVM_EXPAND_ADC_SBB_INSTR(SBB)
++    return FlagArithMnemonic::SBB;
++
++#undef LLVM_EXPAND_ADC_SBB_INSTR
++
++    LLVM_EXPAND_INSTR_SIZES(RCL, rCL)
++    LLVM_EXPAND_INSTR_SIZES(RCL, r1)
++    LLVM_EXPAND_INSTR_SIZES(RCL, ri)
++    return FlagArithMnemonic::RCL;
++
++    LLVM_EXPAND_INSTR_SIZES(RCR, rCL)
++    LLVM_EXPAND_INSTR_SIZES(RCR, r1)
++    LLVM_EXPAND_INSTR_SIZES(RCR, ri)
++    return FlagArithMnemonic::RCR;
++
++#undef LLVM_EXPAND_INSTR_SIZES
++
++  case X86::ADCX32rr:
++  case X86::ADCX64rr:
++  case X86::ADCX32rm:
++  case X86::ADCX64rm:
++    return FlagArithMnemonic::ADCX;
++
++  case X86::ADOX32rr:
++  case X86::ADOX64rr:
++  case X86::ADOX32rm:
++  case X86::ADOX64rm:
++    return FlagArithMnemonic::ADOX;
++  }
++}
++
++static MachineBasicBlock &splitBlock(MachineBasicBlock &MBB,
++                                     MachineInstr &SplitI,
++                                     const X86InstrInfo &TII) {
++  MachineFunction &MF = *MBB.getParent();
++
++  assert(SplitI.getParent() == &MBB &&
++         "Split instruction must be in the split block!");
++  assert(SplitI.isBranch() &&
++         "Only designed to split a tail of branch instructions!");
++  assert(X86::getCondFromBranchOpc(SplitI.getOpcode()) != X86::COND_INVALID &&
++         "Must split on an actual jCC instruction!");
++
++  // Dig out the previous instruction to the split point.
++  MachineInstr &PrevI = *std::prev(SplitI.getIterator());
++  assert(PrevI.isBranch() && "Must split after a branch!");
++  assert(X86::getCondFromBranchOpc(PrevI.getOpcode()) != X86::COND_INVALID &&
++         "Must split after an actual jCC instruction!");
++  assert(!std::prev(PrevI.getIterator())->isTerminator() &&
++         "Must only have this one terminator prior to the split!");
++
++  // Grab the one successor edge that will stay in `MBB`.
++  MachineBasicBlock &UnsplitSucc = *PrevI.getOperand(0).getMBB();
++
++  // Analyze the original block to see if we are actually splitting an edge
++  // into two edges. This can happen when we have multiple conditional jumps to
++  // the same successor.
++  bool IsEdgeSplit =
++      std::any_of(SplitI.getIterator(), MBB.instr_end(),
++                  [&](MachineInstr &MI) {
++                    assert(MI.isTerminator() &&
++                           "Should only have spliced terminators!");
++                    return llvm::any_of(
++                        MI.operands(), [&](MachineOperand &MOp) {
++                          return MOp.isMBB() && MOp.getMBB() == &UnsplitSucc;
++                        });
++                  }) ||
++      MBB.getFallThrough() == &UnsplitSucc;
++
++  MachineBasicBlock &NewMBB = *MF.CreateMachineBasicBlock();
++
++  // Insert the new block immediately after the current one. Any existing
++  // fallthrough will be sunk into this new block anyways.
++  MF.insert(std::next(MachineFunction::iterator(&MBB)), &NewMBB);
++
++  // Splice the tail of instructions into the new block.
++  NewMBB.splice(NewMBB.end(), &MBB, SplitI.getIterator(), MBB.end());
++
++  // Copy the necessary succesors (and their probability info) into the new
++  // block.
++  for (auto SI = MBB.succ_begin(), SE = MBB.succ_end(); SI != SE; ++SI)
++    if (IsEdgeSplit || *SI != &UnsplitSucc)
++      NewMBB.copySuccessor(&MBB, SI);
++  // Normalize the probabilities if we didn't end up splitting the edge.
++  if (!IsEdgeSplit)
++    NewMBB.normalizeSuccProbs();
++
++  // Now replace all of the moved successors in the original block with the new
++  // block. This will merge their probabilities.
++  for (MachineBasicBlock *Succ : NewMBB.successors())
++    if (Succ != &UnsplitSucc)
++      MBB.replaceSuccessor(Succ, &NewMBB);
++
++  // We should always end up replacing at least one successor.
++  assert(MBB.isSuccessor(&NewMBB) &&
++         "Failed to make the new block a successor!");
++
++  // Now update all the PHIs.
++  for (MachineBasicBlock *Succ : NewMBB.successors()) {
++    for (MachineInstr &MI : *Succ) {
++      if (!MI.isPHI())
++        break;
++
++      for (int OpIdx = 1, NumOps = MI.getNumOperands(); OpIdx < NumOps;
++           OpIdx += 2) {
++        MachineOperand &OpV = MI.getOperand(OpIdx);
++        MachineOperand &OpMBB = MI.getOperand(OpIdx + 1);
++        assert(OpMBB.isMBB() && "Block operand to a PHI is not a block!");
++        if (OpMBB.getMBB() != &MBB)
++          continue;
++
++        // Replace the operand for unsplit successors
++        if (!IsEdgeSplit || Succ != &UnsplitSucc) {
++          OpMBB.setMBB(&NewMBB);
++
++          // We have to continue scanning as there may be multiple entries in
++          // the PHI.
++          continue;
++        }
++
++        // When we have split the edge append a new successor.
++        MI.addOperand(MF, OpV);
++        MI.addOperand(MF, MachineOperand::CreateMBB(&NewMBB));
++        break;
++      }
++    }
++  }
++
++  return NewMBB;
++}
++
++bool X86FlagsCopyLoweringPass::runOnMachineFunction(MachineFunction &MF) {
++  DEBUG(dbgs() << "********** " << getPassName() << " : " << MF.getName()
++               << " **********\n");
++
++  auto &Subtarget = MF.getSubtarget<X86Subtarget>();
++  MRI = &MF.getRegInfo();
++  TII = Subtarget.getInstrInfo();
++  TRI = Subtarget.getRegisterInfo();
++  MDT = &getAnalysis<MachineDominatorTree>();
++  PromoteRC = &X86::GR8RegClass;
++
++  if (MF.begin() == MF.end())
++    // Nothing to do for a degenerate empty function...
++    return false;
++
++  SmallVector<MachineInstr *, 4> Copies;
++  for (MachineBasicBlock &MBB : MF)
++    for (MachineInstr &MI : MBB)
++      if (MI.getOpcode() == TargetOpcode::COPY &&
++          MI.getOperand(0).getReg() == X86::EFLAGS)
++        Copies.push_back(&MI);
++
++  for (MachineInstr *CopyI : Copies) {
++    MachineBasicBlock &MBB = *CopyI->getParent();
++
++    MachineOperand &VOp = CopyI->getOperand(1);
++    assert(VOp.isReg() &&
++           "The input to the copy for EFLAGS should always be a register!");
++    MachineInstr &CopyDefI = *MRI->getVRegDef(VOp.getReg());
++    if (CopyDefI.getOpcode() != TargetOpcode::COPY) {
++      // FIXME: The big likely candidate here are PHI nodes. We could in theory
++      // handle PHI nodes, but it gets really, really hard. Insanely hard. Hard
++      // enough that it is probably better to change every other part of LLVM
++      // to avoid creating them. The issue is that once we have PHIs we won't
++      // know which original EFLAGS value we need to capture with our setCCs
++      // below. The end result will be computing a complete set of setCCs that
++      // we *might* want, computing them in every place where we copy *out* of
++      // EFLAGS and then doing SSA formation on all of them to insert necessary
++      // PHI nodes and consume those here. Then hoping that somehow we DCE the
++      // unnecessary ones. This DCE seems very unlikely to be successful and so
++      // we will almost certainly end up with a glut of dead setCC
++      // instructions. Until we have a motivating test case and fail to avoid
++      // it by changing other parts of LLVM's lowering, we refuse to handle
++      // this complex case here.
++      DEBUG(dbgs() << "ERROR: Encountered unexpected def of an eflags copy: ";
++            CopyDefI.dump());
++      report_fatal_error(
++          "Cannot lower EFLAGS copy unless it is defined in turn by a copy!");
++    }
++
++    auto Cleanup = make_scope_exit([&] {
++      // All uses of the EFLAGS copy are now rewritten, kill the copy into
++      // eflags and if dead the copy from.
++      CopyI->eraseFromParent();
++      if (MRI->use_empty(CopyDefI.getOperand(0).getReg()))
++        CopyDefI.eraseFromParent();
++      ++NumCopiesEliminated;
++    });
++
++    MachineOperand &DOp = CopyI->getOperand(0);
++    assert(DOp.isDef() && "Expected register def!");
++    assert(DOp.getReg() == X86::EFLAGS && "Unexpected copy def register!");
++    if (DOp.isDead())
++      continue;
++
++    MachineBasicBlock &TestMBB = *CopyDefI.getParent();
++    auto TestPos = CopyDefI.getIterator();
++    DebugLoc TestLoc = CopyDefI.getDebugLoc();
++
++    DEBUG(dbgs() << "Rewriting copy: "; CopyI->dump());
++
++    // Scan for usage of newly set EFLAGS so we can rewrite them. We just buffer
++    // jumps because their usage is very constrained.
++    bool FlagsKilled = false;
++    SmallVector<MachineInstr *, 4> JmpIs;
++
++    // Gather the condition flags that have already been preserved in
++    // registers. We do this from scratch each time as we expect there to be
++    // very few of them and we expect to not revisit the same copy definition
++    // many times. If either of those change sufficiently we could build a map
++    // of these up front instead.
++    CondRegArray CondRegs = collectCondsInRegs(TestMBB, CopyDefI);
++
++    // Collect the basic blocks we need to scan. Typically this will just be
++    // a single basic block but we may have to scan multiple blocks if the
++    // EFLAGS copy lives into successors.
++    SmallVector<MachineBasicBlock *, 2> Blocks;
++    SmallPtrSet<MachineBasicBlock *, 2> VisitedBlocks;
++    Blocks.push_back(&MBB);
++    VisitedBlocks.insert(&MBB);
++
++    do {
++      MachineBasicBlock &UseMBB = *Blocks.pop_back_val();
++
++      // We currently don't do any PHI insertion and so we require that the
++      // test basic block dominates all of the use basic blocks.
++      //
++      // We could in theory do PHI insertion here if it becomes useful by just
++      // taking undef values in along every edge that we don't trace this
++      // EFLAGS copy along. This isn't as bad as fully general PHI insertion,
++      // but still seems like a great deal of complexity.
++      //
++      // Because it is theoretically possible that some earlier MI pass or
++      // other lowering transformation could induce this to happen, we do
++      // a hard check even in non-debug builds here.
++      if (&TestMBB != &UseMBB && !MDT->dominates(&TestMBB, &UseMBB)) {
++        DEBUG({
++          dbgs() << "ERROR: Encountered use that is not dominated by our test "
++                    "basic block! Rewriting this would require inserting PHI "
++                    "nodes to track the flag state across the CFG.\n\nTest "
++                    "block:\n";
++          TestMBB.dump();
++          dbgs() << "Use block:\n";
++          UseMBB.dump();
++        });
++        report_fatal_error("Cannot lower EFLAGS copy when original copy def "
++                           "does not dominate all uses.");
++      }
++
++      for (auto MII = &UseMBB == &MBB ? std::next(CopyI->getIterator())
++                                      : UseMBB.instr_begin(),
++                MIE = UseMBB.instr_end();
++           MII != MIE;) {
++        MachineInstr &MI = *MII++;
++        MachineOperand *FlagUse = MI.findRegisterUseOperand(X86::EFLAGS);
++        if (!FlagUse) {
++          if (MI.findRegisterDefOperand(X86::EFLAGS)) {
++            // If EFLAGS are defined, it's as-if they were killed. We can stop
++            // scanning here.
++            //
++            // NB!!! Many instructions only modify some flags. LLVM currently
++            // models this as clobbering all flags, but if that ever changes
++            // this will need to be carefully updated to handle that more
++            // complex logic.
++            FlagsKilled = true;
++            break;
++          }
++          continue;
++        }
++
++        DEBUG(dbgs() << "  Rewriting use: "; MI.dump());
++
++        // Check the kill flag before we rewrite as that may change it.
++        if (FlagUse->isKill())
++          FlagsKilled = true;
++
++        // Once we encounter a branch, the rest of the instructions must also be
++        // branches. We can't rewrite in place here, so we handle them below.
++        //
++        // Note that we don't have to handle tail calls here, even conditional
++        // tail calls, as those are not introduced into the X86 MI until post-RA
++        // branch folding or black placement. As a consequence, we get to deal
++        // with the simpler formulation of conditional branches followed by tail
++        // calls.
++        if (X86::getCondFromBranchOpc(MI.getOpcode()) != X86::COND_INVALID) {
++          auto JmpIt = MI.getIterator();
++          do {
++            JmpIs.push_back(&*JmpIt);
++            ++JmpIt;
++          } while (JmpIt != UseMBB.instr_end() &&
++                   X86::getCondFromBranchOpc(JmpIt->getOpcode()) !=

*** DIFF OUTPUT TRUNCATED AT 1000 LINES ***