svn commit: r467849 - in head/devel/llvm60: . files files/clang
Brooks Davis
brooks at FreeBSD.org
Fri Apr 20 22:46:23 UTC 2018
Author: brooks
Date: Fri Apr 20 22:46:22 2018
New Revision: 467849
URL: https://svnweb.freebsd.org/changeset/ports/467849
Log:
Merge r332833 from FreeBSD HEAD.
This should ensure clang does not use pushf/popf sequences to
save and restore flags, avoiding problems with unrelated flags (such as
the interrupt flag) being restored unexpectedly.
PR: 225330
Added:
head/devel/llvm60/files/clang/patch-fsvn-r332833-clang (contents, props changed)
head/devel/llvm60/files/patch-fsvn-r332833 (contents, props changed)
Modified:
head/devel/llvm60/Makefile
Modified: head/devel/llvm60/Makefile
==============================================================================
--- head/devel/llvm60/Makefile Fri Apr 20 21:50:41 2018 (r467848)
+++ head/devel/llvm60/Makefile Fri Apr 20 22:46:22 2018 (r467849)
@@ -2,7 +2,7 @@
PORTNAME= llvm
DISTVERSION= 6.0.0
-PORTREVISION= 1
+PORTREVISION= 2
CATEGORIES= devel lang
MASTER_SITES= http://${PRE_}releases.llvm.org/${LLVM_RELEASE}/${RCDIR}
PKGNAMESUFFIX= ${LLVM_SUFFIX}
Added: head/devel/llvm60/files/clang/patch-fsvn-r332833-clang
==============================================================================
--- /dev/null 00:00:00 1970 (empty, because file is newly added)
+++ head/devel/llvm60/files/clang/patch-fsvn-r332833-clang Fri Apr 20 22:46:22 2018 (r467849)
@@ -0,0 +1,258 @@
+commit f13397cb22ae77e9b18e29273e2920bd63c17ef1
+Author: dim <dim at FreeBSD.org>
+Date: Fri Apr 20 18:20:55 2018 +0000
+
+ Recommit r332501, with an additional upstream fix for "Cannot lower
+ EFLAGS copy that lives out of a basic block!" errors on i386.
+
+ Pull in r325446 from upstream clang trunk (by me):
+
+ [X86] Add 'sahf' CPU feature to frontend
+
+ Summary:
+ Make clang accept `-msahf` (and `-mno-sahf`) flags to activate the
+ `+sahf` feature for the backend, for bug 36028 (Incorrect use of
+ pushf/popf enables/disables interrupts on amd64 kernels). This was
+ originally submitted in bug 36037 by Jonathan Looney
+ <jonlooney at gmail.com>.
+
+ As described there, GCC also uses `-msahf` for this feature, and the
+ backend already recognizes the `+sahf` feature. All that is needed is
+ to teach clang to pass this on to the backend.
+
+ The mapping of feature support onto CPUs may not be complete; rather,
+ it was chosen to match LLVM's idea of which CPUs support this feature
+ (see lib/Target/X86/X86.td).
+
+ I also updated the affected test case (CodeGen/attr-target-x86.c) to
+ match the emitted output.
+
+ Reviewers: craig.topper, coby, efriedma, rsmith
+
+ Reviewed By: craig.topper
+
+ Subscribers: emaste, cfe-commits
+
+ Differential Revision: https://reviews.llvm.org/D43394
+
+ Pull in r328944 from upstream llvm trunk (by Chandler Carruth):
+
+ [x86] Expose more of the condition conversion routines in the public
+ API for X86's instruction information. I've now got a second patch
+ under review that needs these same APIs. This bit is nicely
+ orthogonal and obvious, so landing it. NFC.
+
+ Pull in r329414 from upstream llvm trunk (by Craig Topper):
+
+ [X86] Merge itineraries for CLC, CMC, and STC.
+
+ These are very simple flag setting instructions that appear to only
+ be a single uop. They're unlikely to need this separation.
+
+ Pull in r329657 from upstream llvm trunk (by Chandler Carruth):
+
+ [x86] Introduce a pass to begin more systematically fixing PR36028
+ and similar issues.
+
+ The key idea is to lower COPY nodes populating EFLAGS by scanning the
+ uses of EFLAGS and introducing dedicated code to preserve the
+ necessary state in a GPR. In the vast majority of cases, these uses
+ are cmovCC and jCC instructions. For such cases, we can very easily
+ save and restore the necessary information by simply inserting a
+ setCC into a GPR where the original flags are live, and then testing
+ that GPR directly to feed the cmov or conditional branch.
+
+ However, things are a bit more tricky if arithmetic is using the
+ flags. This patch handles the vast majority of cases that seem to
+ come up in practice: adc, adcx, adox, rcl, and rcr; all without
+ taking advantage of partially preserved EFLAGS as LLVM doesn't
+ currently model that at all.
+
+ There are a large number of operations that techinaclly observe
+ EFLAGS currently but shouldn't in this case -- they typically are
+ using DF. Currently, they will not be handled by this approach.
+ However, I have never seen this issue come up in practice. It is
+ already pretty rare to have these patterns come up in practical code
+ with LLVM. I had to resort to writing MIR tests to cover most of the
+ logic in this pass already. I suspect even with its current amount
+ of coverage of arithmetic users of EFLAGS it will be a significant
+ improvement over the current use of pushf/popf. It will also produce
+ substantially faster code in most of the common patterns.
+
+ This patch also removes all of the old lowering for EFLAGS copies,
+ and the hack that forced us to use a frame pointer when EFLAGS copies
+ were found anywhere in a function so that the dynamic stack
+ adjustment wasn't a problem. None of this is needed as we now lower
+ all of these copies directly in MI and without require stack
+ adjustments.
+
+ Lots of thanks to Reid who came up with several aspects of this
+ approach, and Craig who helped me work out a couple of things
+ tripping me up while working on this.
+
+ Differential Revision: https://reviews.llvm.org/D45146
+
+ Pull in r329673 from upstream llvm trunk (by Chandler Carruth):
+
+ [x86] Model the direction flag (DF) separately from the rest of
+ EFLAGS.
+
+ This cleans up a number of operations that only claimed te use EFLAGS
+ due to using DF. But no instructions which we think of us setting
+ EFLAGS actually modify DF (other than things like popf) and so this
+ needlessly creates uses of EFLAGS that aren't really there.
+
+ In fact, DF is so restrictive it is pretty easy to model. Only STD,
+ CLD, and the whole-flags writes (WRFLAGS and POPF) need to model
+ this.
+
+ I've also somewhat cleaned up some of the flag management instruction
+ definitions to be in the correct .td file.
+
+ Adding this extra register also uncovered a failure to use the
+ correct datatype to hold X86 registers, and I've corrected that as
+ necessary here.
+
+ Differential Revision: https://reviews.llvm.org/D45154
+
+ Pull in r330264 from upstream llvm trunk (by Chandler Carruth):
+
+ [x86] Fix PR37100 by teaching the EFLAGS copy lowering to rewrite
+ uses across basic blocks in the limited cases where it is very
+ straight forward to do so.
+
+ This will also be useful for other places where we do some limited
+ EFLAGS propagation across CFG edges and need to handle copy rewrites
+ afterward. I think this is rapidly approaching the maximum we can and
+ should be doing here. Everything else begins to require either heroic
+ analysis to prove how to do PHI insertion manually, or somehow
+ managing arbitrary PHI-ing of EFLAGS with general PHI insertion.
+ Neither of these seem at all promising so if those cases come up,
+ we'll almost certainly need to rewrite the parts of LLVM that produce
+ those patterns.
+
+ We do now require dominator trees in order to reliably diagnose
+ patterns that would require PHI nodes. This is a bit unfortunate but
+ it seems better than the completely mysterious crash we would get
+ otherwise.
+
+ Differential Revision: https://reviews.llvm.org/D45673
+
+ Together, these should ensure clang does not use pushf/popf sequences to
+ save and restore flags, avoiding problems with unrelated flags (such as
+ the interrupt flag) being restored unexpectedly.
+
+ Requested by: jtl
+ PR: 225330
+ MFC after: 1 week
+
+diff --git llvm/tools/clang/include/clang/Driver/Options.td llvm/tools/clang/include/clang/Driver/Options.td
+index ad72aef3fc9..cab450042e6 100644
+--- tools/clang/include/clang/Driver/Options.td
++++ tools/clang/include/clang/Driver/Options.td
+@@ -2559,6 +2559,8 @@ def mrtm : Flag<["-"], "mrtm">, Group<m_x86_Features_Group>;
+ def mno_rtm : Flag<["-"], "mno-rtm">, Group<m_x86_Features_Group>;
+ def mrdseed : Flag<["-"], "mrdseed">, Group<m_x86_Features_Group>;
+ def mno_rdseed : Flag<["-"], "mno-rdseed">, Group<m_x86_Features_Group>;
++def msahf : Flag<["-"], "msahf">, Group<m_x86_Features_Group>;
++def mno_sahf : Flag<["-"], "mno-sahf">, Group<m_x86_Features_Group>;
+ def msgx : Flag<["-"], "msgx">, Group<m_x86_Features_Group>;
+ def mno_sgx : Flag<["-"], "mno-sgx">, Group<m_x86_Features_Group>;
+ def msha : Flag<["-"], "msha">, Group<m_x86_Features_Group>;
+diff --git llvm/tools/clang/lib/Basic/Targets/X86.cpp llvm/tools/clang/lib/Basic/Targets/X86.cpp
+index cfa6c571d6e..8251e6abd64 100644
+--- tools/clang/lib/Basic/Targets/X86.cpp
++++ tools/clang/lib/Basic/Targets/X86.cpp
+@@ -198,6 +198,7 @@ bool X86TargetInfo::initFeatureMap(
+ LLVM_FALLTHROUGH;
+ case CK_Core2:
+ setFeatureEnabledImpl(Features, "ssse3", true);
++ setFeatureEnabledImpl(Features, "sahf", true);
+ LLVM_FALLTHROUGH;
+ case CK_Yonah:
+ case CK_Prescott:
+@@ -239,6 +240,7 @@ bool X86TargetInfo::initFeatureMap(
+ setFeatureEnabledImpl(Features, "ssse3", true);
+ setFeatureEnabledImpl(Features, "fxsr", true);
+ setFeatureEnabledImpl(Features, "cx16", true);
++ setFeatureEnabledImpl(Features, "sahf", true);
+ break;
+
+ case CK_KNM:
+@@ -269,6 +271,7 @@ bool X86TargetInfo::initFeatureMap(
+ setFeatureEnabledImpl(Features, "xsaveopt", true);
+ setFeatureEnabledImpl(Features, "xsave", true);
+ setFeatureEnabledImpl(Features, "movbe", true);
++ setFeatureEnabledImpl(Features, "sahf", true);
+ break;
+
+ case CK_K6_2:
+@@ -282,6 +285,7 @@ bool X86TargetInfo::initFeatureMap(
+ setFeatureEnabledImpl(Features, "sse4a", true);
+ setFeatureEnabledImpl(Features, "lzcnt", true);
+ setFeatureEnabledImpl(Features, "popcnt", true);
++ setFeatureEnabledImpl(Features, "sahf", true);
+ LLVM_FALLTHROUGH;
+ case CK_K8SSE3:
+ setFeatureEnabledImpl(Features, "sse3", true);
+@@ -315,6 +319,7 @@ bool X86TargetInfo::initFeatureMap(
+ setFeatureEnabledImpl(Features, "prfchw", true);
+ setFeatureEnabledImpl(Features, "cx16", true);
+ setFeatureEnabledImpl(Features, "fxsr", true);
++ setFeatureEnabledImpl(Features, "sahf", true);
+ break;
+
+ case CK_ZNVER1:
+@@ -338,6 +343,7 @@ bool X86TargetInfo::initFeatureMap(
+ setFeatureEnabledImpl(Features, "prfchw", true);
+ setFeatureEnabledImpl(Features, "rdrnd", true);
+ setFeatureEnabledImpl(Features, "rdseed", true);
++ setFeatureEnabledImpl(Features, "sahf", true);
+ setFeatureEnabledImpl(Features, "sha", true);
+ setFeatureEnabledImpl(Features, "sse4a", true);
+ setFeatureEnabledImpl(Features, "xsave", true);
+@@ -372,6 +378,7 @@ bool X86TargetInfo::initFeatureMap(
+ setFeatureEnabledImpl(Features, "cx16", true);
+ setFeatureEnabledImpl(Features, "fxsr", true);
+ setFeatureEnabledImpl(Features, "xsave", true);
++ setFeatureEnabledImpl(Features, "sahf", true);
+ break;
+ }
+ if (!TargetInfo::initFeatureMap(Features, Diags, CPU, FeaturesVec))
+@@ -768,6 +775,8 @@ bool X86TargetInfo::handleTargetFeatures(std::vector<std::string> &Features,
+ HasRetpoline = true;
+ } else if (Feature == "+retpoline-external-thunk") {
+ HasRetpolineExternalThunk = true;
++ } else if (Feature == "+sahf") {
++ HasLAHFSAHF = true;
+ }
+
+ X86SSEEnum Level = llvm::StringSwitch<X86SSEEnum>(Feature)
+@@ -1240,6 +1249,7 @@ bool X86TargetInfo::isValidFeatureName(StringRef Name) const {
+ .Case("rdrnd", true)
+ .Case("rdseed", true)
+ .Case("rtm", true)
++ .Case("sahf", true)
+ .Case("sgx", true)
+ .Case("sha", true)
+ .Case("shstk", true)
+@@ -1313,6 +1323,7 @@ bool X86TargetInfo::hasFeature(StringRef Feature) const {
+ .Case("retpoline", HasRetpoline)
+ .Case("retpoline-external-thunk", HasRetpolineExternalThunk)
+ .Case("rtm", HasRTM)
++ .Case("sahf", HasLAHFSAHF)
+ .Case("sgx", HasSGX)
+ .Case("sha", HasSHA)
+ .Case("shstk", HasSHSTK)
+diff --git llvm/tools/clang/lib/Basic/Targets/X86.h llvm/tools/clang/lib/Basic/Targets/X86.h
+index 590531c1785..fa2fbee387b 100644
+--- tools/clang/lib/Basic/Targets/X86.h
++++ tools/clang/lib/Basic/Targets/X86.h
+@@ -98,6 +98,7 @@ class LLVM_LIBRARY_VISIBILITY X86TargetInfo : public TargetInfo {
+ bool HasPREFETCHWT1 = false;
+ bool HasRetpoline = false;
+ bool HasRetpolineExternalThunk = false;
++ bool HasLAHFSAHF = false;
+
+ /// \brief Enumeration of all of the X86 CPUs supported by Clang.
+ ///
Added: head/devel/llvm60/files/patch-fsvn-r332833
==============================================================================
--- /dev/null 00:00:00 1970 (empty, because file is newly added)
+++ head/devel/llvm60/files/patch-fsvn-r332833 Fri Apr 20 22:46:22 2018 (r467849)
@@ -0,0 +1,1623 @@
+commit f13397cb22ae77e9b18e29273e2920bd63c17ef1
+Author: dim <dim at FreeBSD.org>
+Date: Fri Apr 20 18:20:55 2018 +0000
+
+ Recommit r332501, with an additional upstream fix for "Cannot lower
+ EFLAGS copy that lives out of a basic block!" errors on i386.
+
+ Pull in r325446 from upstream clang trunk (by me):
+
+ [X86] Add 'sahf' CPU feature to frontend
+
+ Summary:
+ Make clang accept `-msahf` (and `-mno-sahf`) flags to activate the
+ `+sahf` feature for the backend, for bug 36028 (Incorrect use of
+ pushf/popf enables/disables interrupts on amd64 kernels). This was
+ originally submitted in bug 36037 by Jonathan Looney
+ <jonlooney at gmail.com>.
+
+ As described there, GCC also uses `-msahf` for this feature, and the
+ backend already recognizes the `+sahf` feature. All that is needed is
+ to teach clang to pass this on to the backend.
+
+ The mapping of feature support onto CPUs may not be complete; rather,
+ it was chosen to match LLVM's idea of which CPUs support this feature
+ (see lib/Target/X86/X86.td).
+
+ I also updated the affected test case (CodeGen/attr-target-x86.c) to
+ match the emitted output.
+
+ Reviewers: craig.topper, coby, efriedma, rsmith
+
+ Reviewed By: craig.topper
+
+ Subscribers: emaste, cfe-commits
+
+ Differential Revision: https://reviews.llvm.org/D43394
+
+ Pull in r328944 from upstream llvm trunk (by Chandler Carruth):
+
+ [x86] Expose more of the condition conversion routines in the public
+ API for X86's instruction information. I've now got a second patch
+ under review that needs these same APIs. This bit is nicely
+ orthogonal and obvious, so landing it. NFC.
+
+ Pull in r329414 from upstream llvm trunk (by Craig Topper):
+
+ [X86] Merge itineraries for CLC, CMC, and STC.
+
+ These are very simple flag setting instructions that appear to only
+ be a single uop. They're unlikely to need this separation.
+
+ Pull in r329657 from upstream llvm trunk (by Chandler Carruth):
+
+ [x86] Introduce a pass to begin more systematically fixing PR36028
+ and similar issues.
+
+ The key idea is to lower COPY nodes populating EFLAGS by scanning the
+ uses of EFLAGS and introducing dedicated code to preserve the
+ necessary state in a GPR. In the vast majority of cases, these uses
+ are cmovCC and jCC instructions. For such cases, we can very easily
+ save and restore the necessary information by simply inserting a
+ setCC into a GPR where the original flags are live, and then testing
+ that GPR directly to feed the cmov or conditional branch.
+
+ However, things are a bit more tricky if arithmetic is using the
+ flags. This patch handles the vast majority of cases that seem to
+ come up in practice: adc, adcx, adox, rcl, and rcr; all without
+ taking advantage of partially preserved EFLAGS as LLVM doesn't
+ currently model that at all.
+
+ There are a large number of operations that techinaclly observe
+ EFLAGS currently but shouldn't in this case -- they typically are
+ using DF. Currently, they will not be handled by this approach.
+ However, I have never seen this issue come up in practice. It is
+ already pretty rare to have these patterns come up in practical code
+ with LLVM. I had to resort to writing MIR tests to cover most of the
+ logic in this pass already. I suspect even with its current amount
+ of coverage of arithmetic users of EFLAGS it will be a significant
+ improvement over the current use of pushf/popf. It will also produce
+ substantially faster code in most of the common patterns.
+
+ This patch also removes all of the old lowering for EFLAGS copies,
+ and the hack that forced us to use a frame pointer when EFLAGS copies
+ were found anywhere in a function so that the dynamic stack
+ adjustment wasn't a problem. None of this is needed as we now lower
+ all of these copies directly in MI and without require stack
+ adjustments.
+
+ Lots of thanks to Reid who came up with several aspects of this
+ approach, and Craig who helped me work out a couple of things
+ tripping me up while working on this.
+
+ Differential Revision: https://reviews.llvm.org/D45146
+
+ Pull in r329673 from upstream llvm trunk (by Chandler Carruth):
+
+ [x86] Model the direction flag (DF) separately from the rest of
+ EFLAGS.
+
+ This cleans up a number of operations that only claimed te use EFLAGS
+ due to using DF. But no instructions which we think of us setting
+ EFLAGS actually modify DF (other than things like popf) and so this
+ needlessly creates uses of EFLAGS that aren't really there.
+
+ In fact, DF is so restrictive it is pretty easy to model. Only STD,
+ CLD, and the whole-flags writes (WRFLAGS and POPF) need to model
+ this.
+
+ I've also somewhat cleaned up some of the flag management instruction
+ definitions to be in the correct .td file.
+
+ Adding this extra register also uncovered a failure to use the
+ correct datatype to hold X86 registers, and I've corrected that as
+ necessary here.
+
+ Differential Revision: https://reviews.llvm.org/D45154
+
+ Pull in r330264 from upstream llvm trunk (by Chandler Carruth):
+
+ [x86] Fix PR37100 by teaching the EFLAGS copy lowering to rewrite
+ uses across basic blocks in the limited cases where it is very
+ straight forward to do so.
+
+ This will also be useful for other places where we do some limited
+ EFLAGS propagation across CFG edges and need to handle copy rewrites
+ afterward. I think this is rapidly approaching the maximum we can and
+ should be doing here. Everything else begins to require either heroic
+ analysis to prove how to do PHI insertion manually, or somehow
+ managing arbitrary PHI-ing of EFLAGS with general PHI insertion.
+ Neither of these seem at all promising so if those cases come up,
+ we'll almost certainly need to rewrite the parts of LLVM that produce
+ those patterns.
+
+ We do now require dominator trees in order to reliably diagnose
+ patterns that would require PHI nodes. This is a bit unfortunate but
+ it seems better than the completely mysterious crash we would get
+ otherwise.
+
+ Differential Revision: https://reviews.llvm.org/D45673
+
+ Together, these should ensure clang does not use pushf/popf sequences to
+ save and restore flags, avoiding problems with unrelated flags (such as
+ the interrupt flag) being restored unexpectedly.
+
+ Requested by: jtl
+ PR: 225330
+ MFC after: 1 week
+
+diff --git llvm/include/llvm/CodeGen/MachineBasicBlock.h b/contrib/llvm/include/llvm/CodeGen/MachineBasicBlock.h
+index 0c9110cbaa8..89210e16629 100644
+--- include/llvm/CodeGen/MachineBasicBlock.h
++++ include/llvm/CodeGen/MachineBasicBlock.h
+@@ -449,6 +449,13 @@ class MachineBasicBlock
+ /// Replace successor OLD with NEW and update probability info.
+ void replaceSuccessor(MachineBasicBlock *Old, MachineBasicBlock *New);
+
++ /// Copy a successor (and any probability info) from original block to this
++ /// block's. Uses an iterator into the original blocks successors.
++ ///
++ /// This is useful when doing a partial clone of successors. Afterward, the
++ /// probabilities may need to be normalized.
++ void copySuccessor(MachineBasicBlock *Orig, succ_iterator I);
++
+ /// Transfers all the successors from MBB to this machine basic block (i.e.,
+ /// copies all the successors FromMBB and remove all the successors from
+ /// FromMBB).
+diff --git llvm/lib/CodeGen/MachineBasicBlock.cpp b/contrib/llvm/lib/CodeGen/MachineBasicBlock.cpp
+index 209abf34d88..cd67449e3ac 100644
+--- lib/CodeGen/MachineBasicBlock.cpp
++++ lib/CodeGen/MachineBasicBlock.cpp
+@@ -646,6 +646,14 @@ void MachineBasicBlock::replaceSuccessor(MachineBasicBlock *Old,
+ removeSuccessor(OldI);
+ }
+
++void MachineBasicBlock::copySuccessor(MachineBasicBlock *Orig,
++ succ_iterator I) {
++ if (Orig->Probs.empty())
++ addSuccessor(*I, Orig->getSuccProbability(I));
++ else
++ addSuccessorWithoutProb(*I);
++}
++
+ void MachineBasicBlock::addPredecessor(MachineBasicBlock *Pred) {
+ Predecessors.push_back(Pred);
+ }
+diff --git llvm/lib/Target/X86/Disassembler/X86Disassembler.cpp b/contrib/llvm/lib/Target/X86/Disassembler/X86Disassembler.cpp
+index c58254ae38c..b3c491b3de5 100644
+--- lib/Target/X86/Disassembler/X86Disassembler.cpp
++++ lib/Target/X86/Disassembler/X86Disassembler.cpp
+@@ -265,13 +265,10 @@ MCDisassembler::DecodeStatus X86GenericDisassembler::getInstruction(
+ /// @param reg - The Reg to append.
+ static void translateRegister(MCInst &mcInst, Reg reg) {
+ #define ENTRY(x) X86::x,
+- uint8_t llvmRegnums[] = {
+- ALL_REGS
+- 0
+- };
++ static constexpr MCPhysReg llvmRegnums[] = {ALL_REGS};
+ #undef ENTRY
+
+- uint8_t llvmRegnum = llvmRegnums[reg];
++ MCPhysReg llvmRegnum = llvmRegnums[reg];
+ mcInst.addOperand(MCOperand::createReg(llvmRegnum));
+ }
+
+diff --git llvm/lib/Target/X86/X86.h b/contrib/llvm/lib/Target/X86/X86.h
+index 36132682429..642dda8f422 100644
+--- lib/Target/X86/X86.h
++++ lib/Target/X86/X86.h
+@@ -66,6 +66,9 @@ FunctionPass *createX86OptimizeLEAs();
+ /// Return a pass that transforms setcc + movzx pairs into xor + setcc.
+ FunctionPass *createX86FixupSetCC();
+
++/// Return a pass that lowers EFLAGS copy pseudo instructions.
++FunctionPass *createX86FlagsCopyLoweringPass();
++
+ /// Return a pass that expands WinAlloca pseudo-instructions.
+ FunctionPass *createX86WinAllocaExpander();
+
+diff --git llvm/lib/Target/X86/X86FlagsCopyLowering.cpp b/contrib/llvm/lib/Target/X86/X86FlagsCopyLowering.cpp
+new file mode 100644
+index 00000000000..1b6369b7bfd
+--- /dev/null
++++ lib/Target/X86/X86FlagsCopyLowering.cpp
+@@ -0,0 +1,777 @@
++//====- X86FlagsCopyLowering.cpp - Lowers COPY nodes of EFLAGS ------------===//
++//
++// The LLVM Compiler Infrastructure
++//
++// This file is distributed under the University of Illinois Open Source
++// License. See LICENSE.TXT for details.
++//
++//===----------------------------------------------------------------------===//
++/// \file
++///
++/// Lowers COPY nodes of EFLAGS by directly extracting and preserving individual
++/// flag bits.
++///
++/// We have to do this by carefully analyzing and rewriting the usage of the
++/// copied EFLAGS register because there is no general way to rematerialize the
++/// entire EFLAGS register safely and efficiently. Using `popf` both forces
++/// dynamic stack adjustment and can create correctness issues due to IF, TF,
++/// and other non-status flags being overwritten. Using sequences involving
++/// SAHF don't work on all x86 processors and are often quite slow compared to
++/// directly testing a single status preserved in its own GPR.
++///
++//===----------------------------------------------------------------------===//
++
++#include "X86.h"
++#include "X86InstrBuilder.h"
++#include "X86InstrInfo.h"
++#include "X86Subtarget.h"
++#include "llvm/ADT/ArrayRef.h"
++#include "llvm/ADT/DenseMap.h"
++#include "llvm/ADT/STLExtras.h"
++#include "llvm/ADT/ScopeExit.h"
++#include "llvm/ADT/SmallPtrSet.h"
++#include "llvm/ADT/SmallSet.h"
++#include "llvm/ADT/SmallVector.h"
++#include "llvm/ADT/SparseBitVector.h"
++#include "llvm/ADT/Statistic.h"
++#include "llvm/CodeGen/MachineBasicBlock.h"
++#include "llvm/CodeGen/MachineConstantPool.h"
++#include "llvm/CodeGen/MachineDominators.h"
++#include "llvm/CodeGen/MachineFunction.h"
++#include "llvm/CodeGen/MachineFunctionPass.h"
++#include "llvm/CodeGen/MachineInstr.h"
++#include "llvm/CodeGen/MachineInstrBuilder.h"
++#include "llvm/CodeGen/MachineModuleInfo.h"
++#include "llvm/CodeGen/MachineOperand.h"
++#include "llvm/CodeGen/MachineRegisterInfo.h"
++#include "llvm/CodeGen/MachineSSAUpdater.h"
++#include "llvm/CodeGen/TargetInstrInfo.h"
++#include "llvm/CodeGen/TargetRegisterInfo.h"
++#include "llvm/CodeGen/TargetSchedule.h"
++#include "llvm/CodeGen/TargetSubtargetInfo.h"
++#include "llvm/IR/DebugLoc.h"
++#include "llvm/MC/MCSchedule.h"
++#include "llvm/Pass.h"
++#include "llvm/Support/CommandLine.h"
++#include "llvm/Support/Debug.h"
++#include "llvm/Support/raw_ostream.h"
++#include <algorithm>
++#include <cassert>
++#include <iterator>
++#include <utility>
++
++using namespace llvm;
++
++#define PASS_KEY "x86-flags-copy-lowering"
++#define DEBUG_TYPE PASS_KEY
++
++STATISTIC(NumCopiesEliminated, "Number of copies of EFLAGS eliminated");
++STATISTIC(NumSetCCsInserted, "Number of setCC instructions inserted");
++STATISTIC(NumTestsInserted, "Number of test instructions inserted");
++STATISTIC(NumAddsInserted, "Number of adds instructions inserted");
++
++namespace llvm {
++
++void initializeX86FlagsCopyLoweringPassPass(PassRegistry &);
++
++} // end namespace llvm
++
++namespace {
++
++// Convenient array type for storing registers associated with each condition.
++using CondRegArray = std::array<unsigned, X86::LAST_VALID_COND + 1>;
++
++class X86FlagsCopyLoweringPass : public MachineFunctionPass {
++public:
++ X86FlagsCopyLoweringPass() : MachineFunctionPass(ID) {
++ initializeX86FlagsCopyLoweringPassPass(*PassRegistry::getPassRegistry());
++ }
++
++ StringRef getPassName() const override { return "X86 EFLAGS copy lowering"; }
++ bool runOnMachineFunction(MachineFunction &MF) override;
++ void getAnalysisUsage(AnalysisUsage &AU) const override;
++
++ /// Pass identification, replacement for typeid.
++ static char ID;
++
++private:
++ MachineRegisterInfo *MRI;
++ const X86InstrInfo *TII;
++ const TargetRegisterInfo *TRI;
++ const TargetRegisterClass *PromoteRC;
++ MachineDominatorTree *MDT;
++
++ CondRegArray collectCondsInRegs(MachineBasicBlock &MBB,
++ MachineInstr &CopyDefI);
++
++ unsigned promoteCondToReg(MachineBasicBlock &MBB,
++ MachineBasicBlock::iterator TestPos,
++ DebugLoc TestLoc, X86::CondCode Cond);
++ std::pair<unsigned, bool>
++ getCondOrInverseInReg(MachineBasicBlock &TestMBB,
++ MachineBasicBlock::iterator TestPos, DebugLoc TestLoc,
++ X86::CondCode Cond, CondRegArray &CondRegs);
++ void insertTest(MachineBasicBlock &MBB, MachineBasicBlock::iterator Pos,
++ DebugLoc Loc, unsigned Reg);
++
++ void rewriteArithmetic(MachineBasicBlock &TestMBB,
++ MachineBasicBlock::iterator TestPos, DebugLoc TestLoc,
++ MachineInstr &MI, MachineOperand &FlagUse,
++ CondRegArray &CondRegs);
++ void rewriteCMov(MachineBasicBlock &TestMBB,
++ MachineBasicBlock::iterator TestPos, DebugLoc TestLoc,
++ MachineInstr &CMovI, MachineOperand &FlagUse,
++ CondRegArray &CondRegs);
++ void rewriteCondJmp(MachineBasicBlock &TestMBB,
++ MachineBasicBlock::iterator TestPos, DebugLoc TestLoc,
++ MachineInstr &JmpI, CondRegArray &CondRegs);
++ void rewriteCopy(MachineInstr &MI, MachineOperand &FlagUse,
++ MachineInstr &CopyDefI);
++ void rewriteSetCC(MachineBasicBlock &TestMBB,
++ MachineBasicBlock::iterator TestPos, DebugLoc TestLoc,
++ MachineInstr &SetCCI, MachineOperand &FlagUse,
++ CondRegArray &CondRegs);
++};
++
++} // end anonymous namespace
++
++INITIALIZE_PASS_BEGIN(X86FlagsCopyLoweringPass, DEBUG_TYPE,
++ "X86 EFLAGS copy lowering", false, false)
++INITIALIZE_PASS_END(X86FlagsCopyLoweringPass, DEBUG_TYPE,
++ "X86 EFLAGS copy lowering", false, false)
++
++FunctionPass *llvm::createX86FlagsCopyLoweringPass() {
++ return new X86FlagsCopyLoweringPass();
++}
++
++char X86FlagsCopyLoweringPass::ID = 0;
++
++void X86FlagsCopyLoweringPass::getAnalysisUsage(AnalysisUsage &AU) const {
++ AU.addRequired<MachineDominatorTree>();
++ MachineFunctionPass::getAnalysisUsage(AU);
++}
++
++namespace {
++/// An enumeration of the arithmetic instruction mnemonics which have
++/// interesting flag semantics.
++///
++/// We can map instruction opcodes into these mnemonics to make it easy to
++/// dispatch with specific functionality.
++enum class FlagArithMnemonic {
++ ADC,
++ ADCX,
++ ADOX,
++ RCL,
++ RCR,
++ SBB,
++};
++} // namespace
++
++static FlagArithMnemonic getMnemonicFromOpcode(unsigned Opcode) {
++ switch (Opcode) {
++ default:
++ report_fatal_error("No support for lowering a copy into EFLAGS when used "
++ "by this instruction!");
++
++#define LLVM_EXPAND_INSTR_SIZES(MNEMONIC, SUFFIX) \
++ case X86::MNEMONIC##8##SUFFIX: \
++ case X86::MNEMONIC##16##SUFFIX: \
++ case X86::MNEMONIC##32##SUFFIX: \
++ case X86::MNEMONIC##64##SUFFIX:
++
++#define LLVM_EXPAND_ADC_SBB_INSTR(MNEMONIC) \
++ LLVM_EXPAND_INSTR_SIZES(MNEMONIC, rr) \
++ LLVM_EXPAND_INSTR_SIZES(MNEMONIC, rr_REV) \
++ LLVM_EXPAND_INSTR_SIZES(MNEMONIC, rm) \
++ LLVM_EXPAND_INSTR_SIZES(MNEMONIC, mr) \
++ case X86::MNEMONIC##8ri: \
++ case X86::MNEMONIC##16ri8: \
++ case X86::MNEMONIC##32ri8: \
++ case X86::MNEMONIC##64ri8: \
++ case X86::MNEMONIC##16ri: \
++ case X86::MNEMONIC##32ri: \
++ case X86::MNEMONIC##64ri32: \
++ case X86::MNEMONIC##8mi: \
++ case X86::MNEMONIC##16mi8: \
++ case X86::MNEMONIC##32mi8: \
++ case X86::MNEMONIC##64mi8: \
++ case X86::MNEMONIC##16mi: \
++ case X86::MNEMONIC##32mi: \
++ case X86::MNEMONIC##64mi32: \
++ case X86::MNEMONIC##8i8: \
++ case X86::MNEMONIC##16i16: \
++ case X86::MNEMONIC##32i32: \
++ case X86::MNEMONIC##64i32:
++
++ LLVM_EXPAND_ADC_SBB_INSTR(ADC)
++ return FlagArithMnemonic::ADC;
++
++ LLVM_EXPAND_ADC_SBB_INSTR(SBB)
++ return FlagArithMnemonic::SBB;
++
++#undef LLVM_EXPAND_ADC_SBB_INSTR
++
++ LLVM_EXPAND_INSTR_SIZES(RCL, rCL)
++ LLVM_EXPAND_INSTR_SIZES(RCL, r1)
++ LLVM_EXPAND_INSTR_SIZES(RCL, ri)
++ return FlagArithMnemonic::RCL;
++
++ LLVM_EXPAND_INSTR_SIZES(RCR, rCL)
++ LLVM_EXPAND_INSTR_SIZES(RCR, r1)
++ LLVM_EXPAND_INSTR_SIZES(RCR, ri)
++ return FlagArithMnemonic::RCR;
++
++#undef LLVM_EXPAND_INSTR_SIZES
++
++ case X86::ADCX32rr:
++ case X86::ADCX64rr:
++ case X86::ADCX32rm:
++ case X86::ADCX64rm:
++ return FlagArithMnemonic::ADCX;
++
++ case X86::ADOX32rr:
++ case X86::ADOX64rr:
++ case X86::ADOX32rm:
++ case X86::ADOX64rm:
++ return FlagArithMnemonic::ADOX;
++ }
++}
++
++static MachineBasicBlock &splitBlock(MachineBasicBlock &MBB,
++ MachineInstr &SplitI,
++ const X86InstrInfo &TII) {
++ MachineFunction &MF = *MBB.getParent();
++
++ assert(SplitI.getParent() == &MBB &&
++ "Split instruction must be in the split block!");
++ assert(SplitI.isBranch() &&
++ "Only designed to split a tail of branch instructions!");
++ assert(X86::getCondFromBranchOpc(SplitI.getOpcode()) != X86::COND_INVALID &&
++ "Must split on an actual jCC instruction!");
++
++ // Dig out the previous instruction to the split point.
++ MachineInstr &PrevI = *std::prev(SplitI.getIterator());
++ assert(PrevI.isBranch() && "Must split after a branch!");
++ assert(X86::getCondFromBranchOpc(PrevI.getOpcode()) != X86::COND_INVALID &&
++ "Must split after an actual jCC instruction!");
++ assert(!std::prev(PrevI.getIterator())->isTerminator() &&
++ "Must only have this one terminator prior to the split!");
++
++ // Grab the one successor edge that will stay in `MBB`.
++ MachineBasicBlock &UnsplitSucc = *PrevI.getOperand(0).getMBB();
++
++ // Analyze the original block to see if we are actually splitting an edge
++ // into two edges. This can happen when we have multiple conditional jumps to
++ // the same successor.
++ bool IsEdgeSplit =
++ std::any_of(SplitI.getIterator(), MBB.instr_end(),
++ [&](MachineInstr &MI) {
++ assert(MI.isTerminator() &&
++ "Should only have spliced terminators!");
++ return llvm::any_of(
++ MI.operands(), [&](MachineOperand &MOp) {
++ return MOp.isMBB() && MOp.getMBB() == &UnsplitSucc;
++ });
++ }) ||
++ MBB.getFallThrough() == &UnsplitSucc;
++
++ MachineBasicBlock &NewMBB = *MF.CreateMachineBasicBlock();
++
++ // Insert the new block immediately after the current one. Any existing
++ // fallthrough will be sunk into this new block anyways.
++ MF.insert(std::next(MachineFunction::iterator(&MBB)), &NewMBB);
++
++ // Splice the tail of instructions into the new block.
++ NewMBB.splice(NewMBB.end(), &MBB, SplitI.getIterator(), MBB.end());
++
++ // Copy the necessary succesors (and their probability info) into the new
++ // block.
++ for (auto SI = MBB.succ_begin(), SE = MBB.succ_end(); SI != SE; ++SI)
++ if (IsEdgeSplit || *SI != &UnsplitSucc)
++ NewMBB.copySuccessor(&MBB, SI);
++ // Normalize the probabilities if we didn't end up splitting the edge.
++ if (!IsEdgeSplit)
++ NewMBB.normalizeSuccProbs();
++
++ // Now replace all of the moved successors in the original block with the new
++ // block. This will merge their probabilities.
++ for (MachineBasicBlock *Succ : NewMBB.successors())
++ if (Succ != &UnsplitSucc)
++ MBB.replaceSuccessor(Succ, &NewMBB);
++
++ // We should always end up replacing at least one successor.
++ assert(MBB.isSuccessor(&NewMBB) &&
++ "Failed to make the new block a successor!");
++
++ // Now update all the PHIs.
++ for (MachineBasicBlock *Succ : NewMBB.successors()) {
++ for (MachineInstr &MI : *Succ) {
++ if (!MI.isPHI())
++ break;
++
++ for (int OpIdx = 1, NumOps = MI.getNumOperands(); OpIdx < NumOps;
++ OpIdx += 2) {
++ MachineOperand &OpV = MI.getOperand(OpIdx);
++ MachineOperand &OpMBB = MI.getOperand(OpIdx + 1);
++ assert(OpMBB.isMBB() && "Block operand to a PHI is not a block!");
++ if (OpMBB.getMBB() != &MBB)
++ continue;
++
++ // Replace the operand for unsplit successors
++ if (!IsEdgeSplit || Succ != &UnsplitSucc) {
++ OpMBB.setMBB(&NewMBB);
++
++ // We have to continue scanning as there may be multiple entries in
++ // the PHI.
++ continue;
++ }
++
++ // When we have split the edge append a new successor.
++ MI.addOperand(MF, OpV);
++ MI.addOperand(MF, MachineOperand::CreateMBB(&NewMBB));
++ break;
++ }
++ }
++ }
++
++ return NewMBB;
++}
++
++bool X86FlagsCopyLoweringPass::runOnMachineFunction(MachineFunction &MF) {
++ DEBUG(dbgs() << "********** " << getPassName() << " : " << MF.getName()
++ << " **********\n");
++
++ auto &Subtarget = MF.getSubtarget<X86Subtarget>();
++ MRI = &MF.getRegInfo();
++ TII = Subtarget.getInstrInfo();
++ TRI = Subtarget.getRegisterInfo();
++ MDT = &getAnalysis<MachineDominatorTree>();
++ PromoteRC = &X86::GR8RegClass;
++
++ if (MF.begin() == MF.end())
++ // Nothing to do for a degenerate empty function...
++ return false;
++
++ SmallVector<MachineInstr *, 4> Copies;
++ for (MachineBasicBlock &MBB : MF)
++ for (MachineInstr &MI : MBB)
++ if (MI.getOpcode() == TargetOpcode::COPY &&
++ MI.getOperand(0).getReg() == X86::EFLAGS)
++ Copies.push_back(&MI);
++
++ for (MachineInstr *CopyI : Copies) {
++ MachineBasicBlock &MBB = *CopyI->getParent();
++
++ MachineOperand &VOp = CopyI->getOperand(1);
++ assert(VOp.isReg() &&
++ "The input to the copy for EFLAGS should always be a register!");
++ MachineInstr &CopyDefI = *MRI->getVRegDef(VOp.getReg());
++ if (CopyDefI.getOpcode() != TargetOpcode::COPY) {
++ // FIXME: The big likely candidate here are PHI nodes. We could in theory
++ // handle PHI nodes, but it gets really, really hard. Insanely hard. Hard
++ // enough that it is probably better to change every other part of LLVM
++ // to avoid creating them. The issue is that once we have PHIs we won't
++ // know which original EFLAGS value we need to capture with our setCCs
++ // below. The end result will be computing a complete set of setCCs that
++ // we *might* want, computing them in every place where we copy *out* of
++ // EFLAGS and then doing SSA formation on all of them to insert necessary
++ // PHI nodes and consume those here. Then hoping that somehow we DCE the
++ // unnecessary ones. This DCE seems very unlikely to be successful and so
++ // we will almost certainly end up with a glut of dead setCC
++ // instructions. Until we have a motivating test case and fail to avoid
++ // it by changing other parts of LLVM's lowering, we refuse to handle
++ // this complex case here.
++ DEBUG(dbgs() << "ERROR: Encountered unexpected def of an eflags copy: ";
++ CopyDefI.dump());
++ report_fatal_error(
++ "Cannot lower EFLAGS copy unless it is defined in turn by a copy!");
++ }
++
++ auto Cleanup = make_scope_exit([&] {
++ // All uses of the EFLAGS copy are now rewritten, kill the copy into
++ // eflags and if dead the copy from.
++ CopyI->eraseFromParent();
++ if (MRI->use_empty(CopyDefI.getOperand(0).getReg()))
++ CopyDefI.eraseFromParent();
++ ++NumCopiesEliminated;
++ });
++
++ MachineOperand &DOp = CopyI->getOperand(0);
++ assert(DOp.isDef() && "Expected register def!");
++ assert(DOp.getReg() == X86::EFLAGS && "Unexpected copy def register!");
++ if (DOp.isDead())
++ continue;
++
++ MachineBasicBlock &TestMBB = *CopyDefI.getParent();
++ auto TestPos = CopyDefI.getIterator();
++ DebugLoc TestLoc = CopyDefI.getDebugLoc();
++
++ DEBUG(dbgs() << "Rewriting copy: "; CopyI->dump());
++
++ // Scan for usage of newly set EFLAGS so we can rewrite them. We just buffer
++ // jumps because their usage is very constrained.
++ bool FlagsKilled = false;
++ SmallVector<MachineInstr *, 4> JmpIs;
++
++ // Gather the condition flags that have already been preserved in
++ // registers. We do this from scratch each time as we expect there to be
++ // very few of them and we expect to not revisit the same copy definition
++ // many times. If either of those change sufficiently we could build a map
++ // of these up front instead.
++ CondRegArray CondRegs = collectCondsInRegs(TestMBB, CopyDefI);
++
++ // Collect the basic blocks we need to scan. Typically this will just be
++ // a single basic block but we may have to scan multiple blocks if the
++ // EFLAGS copy lives into successors.
++ SmallVector<MachineBasicBlock *, 2> Blocks;
++ SmallPtrSet<MachineBasicBlock *, 2> VisitedBlocks;
++ Blocks.push_back(&MBB);
++ VisitedBlocks.insert(&MBB);
++
++ do {
++ MachineBasicBlock &UseMBB = *Blocks.pop_back_val();
++
++ // We currently don't do any PHI insertion and so we require that the
++ // test basic block dominates all of the use basic blocks.
++ //
++ // We could in theory do PHI insertion here if it becomes useful by just
++ // taking undef values in along every edge that we don't trace this
++ // EFLAGS copy along. This isn't as bad as fully general PHI insertion,
++ // but still seems like a great deal of complexity.
++ //
++ // Because it is theoretically possible that some earlier MI pass or
++ // other lowering transformation could induce this to happen, we do
++ // a hard check even in non-debug builds here.
++ if (&TestMBB != &UseMBB && !MDT->dominates(&TestMBB, &UseMBB)) {
++ DEBUG({
++ dbgs() << "ERROR: Encountered use that is not dominated by our test "
++ "basic block! Rewriting this would require inserting PHI "
++ "nodes to track the flag state across the CFG.\n\nTest "
++ "block:\n";
++ TestMBB.dump();
++ dbgs() << "Use block:\n";
++ UseMBB.dump();
++ });
++ report_fatal_error("Cannot lower EFLAGS copy when original copy def "
++ "does not dominate all uses.");
++ }
++
++ for (auto MII = &UseMBB == &MBB ? std::next(CopyI->getIterator())
++ : UseMBB.instr_begin(),
++ MIE = UseMBB.instr_end();
++ MII != MIE;) {
++ MachineInstr &MI = *MII++;
++ MachineOperand *FlagUse = MI.findRegisterUseOperand(X86::EFLAGS);
++ if (!FlagUse) {
++ if (MI.findRegisterDefOperand(X86::EFLAGS)) {
++ // If EFLAGS are defined, it's as-if they were killed. We can stop
++ // scanning here.
++ //
++ // NB!!! Many instructions only modify some flags. LLVM currently
++ // models this as clobbering all flags, but if that ever changes
++ // this will need to be carefully updated to handle that more
++ // complex logic.
++ FlagsKilled = true;
++ break;
++ }
++ continue;
++ }
++
++ DEBUG(dbgs() << " Rewriting use: "; MI.dump());
++
++ // Check the kill flag before we rewrite as that may change it.
++ if (FlagUse->isKill())
++ FlagsKilled = true;
++
++ // Once we encounter a branch, the rest of the instructions must also be
++ // branches. We can't rewrite in place here, so we handle them below.
++ //
++ // Note that we don't have to handle tail calls here, even conditional
++ // tail calls, as those are not introduced into the X86 MI until post-RA
++ // branch folding or black placement. As a consequence, we get to deal
++ // with the simpler formulation of conditional branches followed by tail
++ // calls.
++ if (X86::getCondFromBranchOpc(MI.getOpcode()) != X86::COND_INVALID) {
++ auto JmpIt = MI.getIterator();
++ do {
++ JmpIs.push_back(&*JmpIt);
++ ++JmpIt;
++ } while (JmpIt != UseMBB.instr_end() &&
++ X86::getCondFromBranchOpc(JmpIt->getOpcode()) !=
*** DIFF OUTPUT TRUNCATED AT 1000 LINES ***
More information about the svn-ports-all
mailing list