svn commit: r286008 - stable/10/contrib/llvm/patches

Wed Jul 29 13:07:20 UTC 2015

Author: dim
Date: Wed Jul 29 13:07:18 2015
New Revision: 286008
URL: https://svnweb.freebsd.org/changeset/base/286008

Log:
  Add llvm patch corresponding to r286007.

Added:
  stable/10/contrib/llvm/patches/patch-r286007-llvm-r219009-x86-codegen-crash.diff

Added: stable/10/contrib/llvm/patches/patch-r286007-llvm-r219009-x86-codegen-crash.diff
==============================================================================

--- /dev/null	00:00:00 1970	(empty, because file is newly added)
+++ stable/10/contrib/llvm/patches/patch-r286007-llvm-r219009-x86-codegen-crash.diff	Wed Jul 29 13:07:18 2015	(r286008)
@@ -0,0 +1,214 @@
+Pull in r219009 from upstream llvm trunk (by Adam Nemet):
+
+  [ISel] Keep matching state consistent when folding during X86 address match
+
+  In the X86 backend, matching an address is initiated by the 'addr' complex
+  pattern and its friends.  During this process we may reassociate and-of-shift
+  into shift-of-and (FoldMaskedShiftToScaledMask) to allow folding of the
+  shift into the scale of the address.
+
+  However as demonstrated by the testcase, this can trigger CSE of not only the
+  shift and the AND which the code is prepared for but also the underlying load
+  node.  In the testcase this node is sitting in the RecordedNode and MatchScope
+  data structures of the matcher and becomes a deleted node upon CSE.  Returning
+  from the complex pattern function, we try to access it again hitting an assert
+  because the node is no longer a load even though this was checked before.
+
+  Now obviously changing the DAG this late is bending the rules but I think it
+  makes sense somewhat.  Outside of addresses we prefer and-of-shift because it
+  may lead to smaller immediates (FoldMaskAndShiftToScale is an even better
+  example because it create a non-canonical node).  We currently don't recognize
+  addresses during DAGCombiner where arguably this canonicalization should be
+  performed.  On the other hand, having this in the matcher allows us to cover
+  all the cases where an address can be used in an instruction.
+
+  I've also talked a little bit to Dan Gohman on llvm-dev who added the RAUW for
+  the new shift node in FoldMaskedShiftToScaledMask.  This RAUW is responsible
+  for initiating the recursive CSE on users
+  (http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-September/076903.html) but it
+  is not strictly necessary since the shift is hooked into the visited user.  Of
+  course it's safer to keep the DAG consistent at all times (e.g. for accurate
+  number of uses, etc.).
+
+  So rather than changing the fundamentals, I've decided to continue along the
+  previous patches and detect the CSE.  This patch installs a very targeted
+  DAGUpdateListener for the duration of a complex-pattern match and updates the
+  matching state accordingly.  (Previous patches used HandleSDNode to detect the
+  CSE but that's not practical here).  The listener is only installed on X86.
+
+  I tested that there is no measurable overhead due to this while running
+  through the spec2k BC files with llc.  The only thing we pay for is the
+  creation of the listener.  The callback never ever triggers in spec2k since
+  this is a corner case.
+
+  Fixes rdar://problem/18206171
+
+This fixes a possible crash in x86 code generation when compiling recent
+llvm/clang trunk sources.
+
+Introduced here: http://svnweb.freebsd.org/changeset/base/286007
+
+Index: include/llvm/CodeGen/SelectionDAGISel.h
+===================================================================
+--- include/llvm/CodeGen/SelectionDAGISel.h
++++ include/llvm/CodeGen/SelectionDAGISel.h
+@@ -238,6 +238,12 @@ class SelectionDAGISel : public MachineFunctionPas
+                            const unsigned char *MatcherTable,
+                            unsigned TableSize);
+ 
++  /// \brief Return true if complex patterns for this target can mutate the
++  /// DAG.
++  virtual bool ComplexPatternFuncMutatesDAG() const {
++    return false;
++  }
++
+ private:
+ 
+   // Calls to these functions are generated by tblgen.
+Index: lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp
+===================================================================
+--- lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp
++++ lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp
+@@ -2345,6 +2345,42 @@ struct MatchScope {
+   bool HasChainNodesMatched, HasGlueResultNodesMatched;
+ };
+ 
++/// \\brief A DAG update listener to keep the matching state
++/// (i.e. RecordedNodes and MatchScope) uptodate if the target is allowed to
++/// change the DAG while matching.  X86 addressing mode matcher is an example
++/// for this.
++class MatchStateUpdater : public SelectionDAG::DAGUpdateListener
++{
++      SmallVectorImpl<std::pair<SDValue, SDNode*> > &RecordedNodes;
++      SmallVectorImpl<MatchScope> &MatchScopes;
++public:
++  MatchStateUpdater(SelectionDAG &DAG,
++                    SmallVectorImpl<std::pair<SDValue, SDNode*> > &RN,
++                    SmallVectorImpl<MatchScope> &MS) :
++    SelectionDAG::DAGUpdateListener(DAG),
++    RecordedNodes(RN), MatchScopes(MS) { }
++
++  void NodeDeleted(SDNode *N, SDNode *E) {
++    // Some early-returns here to avoid the search if we deleted the node or
++    // if the update comes from MorphNodeTo (MorphNodeTo is the last thing we
++    // do, so it's unnecessary to update matching state at that point).
++    // Neither of these can occur currently because we only install this
++    // update listener during matching a complex patterns.
++    if (!E || E->isMachineOpcode())
++      return;
++    // Performing linear search here does not matter because we almost never
++    // run this code.  You'd have to have a CSE during complex pattern
++    // matching.
++    for (auto &I : RecordedNodes)
++      if (I.first.getNode() == N)
++        I.first.setNode(E);
++
++    for (auto &I : MatchScopes)
++      for (auto &J : I.NodeStack)
++        if (J.getNode() == N)
++          J.setNode(E);
++  }
++};
+ }
+ 
+ SDNode *SelectionDAGISel::
+@@ -2599,6 +2635,14 @@ SelectCodeCommon(SDNode *NodeToMatch, const unsign
+       unsigned CPNum = MatcherTable[MatcherIndex++];
+       unsigned RecNo = MatcherTable[MatcherIndex++];
+       assert(RecNo < RecordedNodes.size() && "Invalid CheckComplexPat");
++
++      // If target can modify DAG during matching, keep the matching state
++      // consistent.
++      std::unique_ptr<MatchStateUpdater> MSU;
++      if (ComplexPatternFuncMutatesDAG())
++        MSU.reset(new MatchStateUpdater(*CurDAG, RecordedNodes,
++                                        MatchScopes));
++
+       if (!CheckComplexPattern(NodeToMatch, RecordedNodes[RecNo].second,
+                                RecordedNodes[RecNo].first, CPNum,
+                                RecordedNodes))
+Index: lib/Target/X86/X86ISelDAGToDAG.cpp
+===================================================================
+--- lib/Target/X86/X86ISelDAGToDAG.cpp
++++ lib/Target/X86/X86ISelDAGToDAG.cpp
+@@ -290,6 +290,13 @@ namespace {
+     const X86InstrInfo *getInstrInfo() const {
+       return getTargetMachine().getInstrInfo();
+     }
++
++    /// \brief Address-mode matching performs shift-of-and to and-of-shift
++    /// reassociation in order to expose more scaled addressing
++    /// opportunities.
++    bool ComplexPatternFuncMutatesDAG() const override {
++      return true;
++    }
+   };
+ }
+ 
+Index: test/CodeGen/X86/addr-mode-matcher.ll
+===================================================================
+--- test/CodeGen/X86/addr-mode-matcher.ll
++++ test/CodeGen/X86/addr-mode-matcher.ll
+@@ -0,0 +1,62 @@
++; RUN: llc < %s | FileCheck %s
++
++; This testcase used to hit an assert during ISel.  For details, see the big
++; comment inside the function.
++
++; CHECK-LABEL: foo:
++; The AND should be turned into a subreg access.
++; CHECK-NOT: and
++; The shift (leal) should be folded into the scale of the address in the load.
++; CHECK-NOT: leal
++; CHECK: movl {{.*}},4),
++
++target datalayout = "e-m:o-p:32:32-f64:32:64-f80:128-n8:16:32-S128"
++target triple = "i386-apple-macosx10.6.0"
++
++define void @foo(i32 %a) {
++bb:
++  br label %bb1692
++
++bb1692:
++  %tmp1694 = phi i32 [ 0, %bb ], [ %tmp1745, %bb1692 ]
++  %xor = xor i32 0, %tmp1694
++
++; %load1 = (load (and (shl %xor, 2), 1020))
++  %tmp1701 = shl i32 %xor, 2
++  %tmp1702 = and i32 %tmp1701, 1020
++  %tmp1703 = getelementptr inbounds [1028 x i8]* null, i32 0, i32 %tmp1702
++  %tmp1704 = bitcast i8* %tmp1703 to i32*
++  %load1 = load i32* %tmp1704, align 4
++
++; %load2 = (load (shl (and %xor, 255), 2))
++  %tmp1698 = and i32 %xor, 255
++  %tmp1706 = shl i32 %tmp1698, 2
++  %tmp1707 = getelementptr inbounds [1028 x i8]* null, i32 0, i32 %tmp1706
++  %tmp1708 = bitcast i8* %tmp1707 to i32*
++  %load2 = load i32* %tmp1708, align 4
++
++  %tmp1710 = or i32 %load2, %a
++
++; While matching xor we address-match %load1.  The and-of-shift reassocication
++; in address matching transform this into into a shift-of-and and the resuting
++; node becomes identical to %load2.  CSE replaces %load1 which leaves its
++; references in MatchScope and RecordedNodes stale.
++  %tmp1711 = xor i32 %load1, %tmp1710
++
++  %tmp1744 = getelementptr inbounds [256 x i32]* null, i32 0, i32 %tmp1711
++  store i32 0, i32* %tmp1744, align 4
++  %tmp1745 = add i32 %tmp1694, 1
++  indirectbr i8* undef, [label %bb1756, label %bb1692]
++
++bb1756:
++  br label %bb2705
++
++bb2705:
++  indirectbr i8* undef, [label %bb5721, label %bb5736]
++
++bb5721:
++  br label %bb2705
++
++bb5736:
++  ret void
++}