I've submitted 207175 for a clang 3.8.0 va_list handling problem for powerpc

Mon Feb 15 19:13:19 UTC 2016

Mark, I believe you're right. What do you think about this patch?

Index: tools/clang/lib/CodeGen/TargetInfo.cpp
===================================================================

--- tools/clang/lib/CodeGen/TargetInfo.cpp	(revision 260852)
+++ tools/clang/lib/CodeGen/TargetInfo.cpp	(working copy)
@@ -3599,6 +3599,8 @@
   {
     CGF.EmitBlock(UsingOverflow);
 
+    Builder.CreateStore(Builder.getInt8(11), NumRegsAddr);
+
     // Everything in the overflow area is rounded up to a size of at least 4.
     CharUnits OverflowAreaAlign = CharUnits::fromQuantity(4);
 

Can you test it?

On Mon, Feb 15, 2016 at 12:52:15AM -0800, Mark Millard wrote:
> 
> I'm top posting as the following can stand on its own fairly well.
> 
> On Sun Feb 14 23:46:14 UTC 2016 Nathan Whitehorn wrote:
> 
> > On 02/14/16 14:34, Mark Millard wrote:
> > > clang's code base is not familiar material for me nor do I have solid 
> > > reference material for the FreeBSD TARGET_ARCH=powerpc ABI rules so 
> > > the below has my guess work involved. The following code appears to 
> > > have hard wired a global, unvarying constant (8) into the test for 
> > > picking UsingRegs vs. UsingOverflow.
> > 
> > For reference, we use the standard ELF ABI 
> > (https://uclibc.org/docs/psABI-ppc.pdf).
> > -Nathan
> 
> Reviewing the Parameter Passing material in that document shows that the problem is in the original specification.
> 
> And there is a more modern specification that has a fix in its wording. (Which shows that I'm not likely to be wrong.) I'll reference and quote it later.
> 
> First I'll explain the problem that is in psABI-ppc.pdf (the old SunSoft 1995 document).
> 
> First a numbering point: psABI-ppc.pdf uses "gr" matching the numeral in r3, r4, . . . , r10, starting at r3 (i.e, 3). And gr indicates the next register to be used, not the last one already used.
> 
> The document splits the algorithm for placement of parameters into 3 stages with the following structure, intended as they have it in the document but various less interesting details for my "8byte then 4byte" example omitted:
> 
> > INITIALIZING:
> >      Set fr=1, gr=3, and starg to the address of
> >      parameter word 1.
> > SCAN:
> >      If there are no more arguments, terminate.
> >      Otherwise, select one of the following
> >      depending on the type of the next argument:
> > 
> >      DOUBLE_OR_FLOAT
> >         If fr>8 ( . . .), go to OTHER. Otherwise,
> >         . . .
> > 
> >      SIMPLE_ARG
> >         If gr>10, go to OTHER. Otherwise, load the
> >         argument value into general register gr,
> >         set gr to gr+1, can goto SCAN. . . .
> > 
> >      LONG_LONG
> >         If gr>9, go to OTHER. Otherwise, . . .
> > 
> > OTHER:
> >        Arguments not otherwise handled above are
> >        passed in the parameter words of the
> >        caller???s stack frame. . . . Set starg to
> >        starg+size, then go to SCAN.
> 
> Note that gr is not incremented by LONG_LONG or by the later OTHER usage when gr>9. (That would be my example's 8 byte integer that is later followed by a 4 byte one.)
> 
> That OTHER's "go to SCAN" would then lead to the following 4 byte integer in my example to be put in r10 and gr then being set to 11 instead of it being stored in a parameter word on the stack.
> 
> The nasty thing about this for va_list/va_arg use is that the stored information does not indicate which was before vs. after in the argument order: the 4 byte r10 content or the 8 byte "OTHER" content: the two orders produce identical results.
> 
> This can not be correct.
> 
> The Power-Arch-32-bit-ABI-supp-1.0-Unified.pdf is more modern and explicitly deals with VR and other modern things. (Its terminology matching LONG_LONG above is DUAL_GP.) But for what I'm dealing with here it has the following extra wording at the very end of its OTHER section:
> 
> > If gr>9 and the type is DUAL_GP ,or . . ., or . . ., then set gr = 11 (to prevent subsequent SINGLE_GPs from being placed in registers after DUAL_GP, QUAD_GP, or EIGHT_GP arguments that would no longer fit in the registers).
> 
> 
> 
> I've left the prior information below for reference.
> 
> ===
> Mark Millard
> markmi at dsl-only.net
> 
> 
> 
> On 2016-Feb-14, at 2:34 PM, Mark Millard <markmi at dsl-only.net> wrote:
> > 
> > On 2016-Feb-14, at 11:29 AM, Roman Divacky <rdivacky at vlakno.cz> wrote:
> >> 
> >> Fwiw, the code to handle the vaarg is in 
> >> tools/clang/lib/CodeGen/TargetInfo.cpp:PPC32_SVR4_ABIInfo::EmitVAArg()
> >> 
> >> You can take a look to see whats wrong.
> >> 
> >> On Sat, Feb 13, 2016 at 07:03:29PM -0800, Mark Millard wrote:
> >>> I've isolated another clang 3.8.0 TARGET_ARCH=powerpc SEGV problem that shows up for using clang 3.8.0 to buildworld/installworld for powerpc.
> >>> 
> >>>> ls -l -n /
> >>> 
> >>> gets a SEGV. As listed in https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=207175 ( and  https://llvm.org/bugs/show_bug.cgi?id=26605 ) the following simplified program also gets the SEGV on powerpc:
> >>> 
> >>>> #include <stdarg.h> // for va_list, va_start, va_arg, va_end
> >>>> #include <stdint.h> // for intmax_t
> >>>> 
> >>>> intmax_t
> >>>> va_test (char *s, ...)
> >>>> {
> >>>>   va_list vap;
> >>>> 
> >>>>   va_start(vap, s);
> >>>> 
> >>>>   char*        t0 = va_arg(vap, char*);
> >>>>   unsigned int o0 = va_arg(vap, unsigned int);
> >>>>   int          c0 = va_arg(vap, int);
> >>>>   unsigned int u0 = va_arg(vap, unsigned int);
> >>>>   int          c1 = va_arg(vap, int);
> >>>>   char *       t1 = va_arg(vap, char*);
> >>>> 
> >>>>   intmax_t     j0 = va_arg(vap, intmax_t); // This spans into overflow_arg_area.
> >>>> 
> >>>>   int          c2 = va_arg(vap, int);      // A copy was put in the 
> >>>>                                            // overflow_arg_area because of the
> >>>>                                            // above.
> >>>>                                            // But this tries to extract from the
> >>>>                                            // last 4 bytes of the reg_save_area.
> >>>>                                            // It does not increment the
> >>>>                                            // overflow_arg_area position pointer
> >>>>                                            // past the copy that is there.
> >>>> 
> >>>>   char *       t2 = va_arg(vap, char*);    // The lack of increment before makes
> >>>>                                            // this extraction off by 4 bytes.
> >>>> 
> >>>>   char         t2fc = *t2;  // <<< This gets SEGV. t2 actually got what should be
> >>>>                             //     the c2 value.
> >>>> 
> >>>>   intmax_t     j1 = va_arg(vap, intmax_t);
> >>>> 
> >>>>   va_end(vap);
> >>>> 
> >>>>   return (intmax_t) ((s-t2)+(t0-t1)+o0+u0+j0+j1+c0+c1+c2+t2fc);
> >>>>   // Avoid any optimize-away for lack of use.
> >>>> }
> >>>> 
> >>>> int main(void)
> >>>> {
> >>>>   char         s[1025] = "test string for this";
> >>>> 
> >>>>   char*        t0 = s + 5;
> >>>>   unsigned int o0 = 3;
> >>>>   int          c0 = 1;
> >>>>   unsigned int u0 = 1;
> >>>>   int          c1 = 3;
> >>>>   char *       t1 = s + 12;
> >>>>   intmax_t     j0 = 314159265358979323;
> >>>>   int          c2 = 4;
> >>>>   char *       t2 = s + 16;
> >>>>   intmax_t     j1 = ~314159265358979323;
> >>>> 
> >>>>   intmax_t      result = va_test(s,t0,o0,c0,u0,c1,t1,j0,c1,t2,j1);
> >>>> 
> >>>>   return (int) (result - (intmax_t) ((s-t2)+(t0-t1)+o0+u0+j0+j1+c0+c1+c2+*t2));
> >>>>   // Avoid any optimize-away for lack of use.
> >>>> }
> >>> 
> >>> 
> >>> 
> >>> ===
> >>> Mark Millard
> >>> markmi at dsl-only.net
> >>> 
> >>> _______________________________________________
> >>> freebsd-toolchain at freebsd.org mailing list
> >>> https://lists.freebsd.org/mailman/listinfo/freebsd-toolchain
> >>> To unsubscribe, send any mail to "freebsd-toolchain-unsubscribe at freebsd.org"
> > 
> > clang's code base is not familiar material for me nor do I have solid reference material for the FreeBSD TARGET_ARCH=powerpc ABI rules so the below has my guess work involved.
> > 
> > The following code appears to have hard wired a global, unvarying constant (8) into the test for picking UsingRegs vs. UsingOverflow.
> > 
> > 
> >>  llvm::Value *NumRegs = Builder.CreateLoad(NumRegsAddr, "numUsedRegs");
> > . . .
> >>  llvm::Value *CC =
> >>      Builder.CreateICmpULT(NumRegs, Builder.getInt8(8), "cond");
> >> 
> >>  llvm::BasicBlock *UsingRegs = CGF.createBasicBlock("using_regs");
> >>  llvm::BasicBlock *UsingOverflow = CGF.createBasicBlock("using_overflow");
> >>  llvm::BasicBlock *Cont = CGF.createBasicBlock("cont");
> >> 
> >>  Builder.CreateCondBr(CC, UsingRegs, UsingOverflow);
> > . . .
> >>  // Case 1: consume registers.
> >>  Address RegAddr = Address::invalid();
> >>  {
> > . . .
> >>    // Increase the used-register count.
> >>    NumRegs =
> >>      Builder.CreateAdd(NumRegs,
> >>                        Builder.getInt8((isI64 || (isF64 && IsSoftFloatABI)) ? 2 : 1));
> >>    Builder.CreateStore(NumRegs, NumRegsAddr);. . .
> > . . .
> >>  }
> >> 
> >>  // Case 2: consume space in the overflow area.
> >>  Address MemAddr = Address::invalid();
> >>  {
> > . . . (no adjustments to NumRegs) . . .
> > 
> > If so the means of counting NumRegs (a.k.a. gpr) then needs to take into account an allocated but unused last UsingRegs "slot" sometimes. Imagine. . .
> > 
> > r3, r4, r5, r6, r7, r8, r9 in use already so r10 is the last possible "UsingRegs" context.
> > (0  1   2   3   4   5   6, leaving r10 as position 7, the last < 8 value)
> > 
> > Then the next two arguments are a 8 byte integer then a a 4 byte integer (in that order). That results in what should be:
> > 
> > r10 "UsingRegs" slot reserved and un-accessed
> > In other words: counted as allocated so that the rest goes in in the overflow area
> > (so no position 7 usage)
> > 
> > then
> > 
> > overflow with the 8 byte integer then the 4 byte integer.
> > 
> > 
> > And, in fact, the memory content reflects this in the overflow area.
> > 
> > 
> > But the va_arg access code does not count r10's slot as allocated in "Using Regs" after the 8 byte integer. So later it tries to use r10's slot for the 4 byte integer that is actually in the UsingOverflow area.
> > 
> > One fix of sorts is to have "Case 2: consume space in the overflow area." set NumRegs (a.k.a. gpr) to the bound from the Builder.CreateICmpULT (8 in this context). Then the first (or any/every) use of the UsingOverflow area forces no more use of the UsingRegs area (for the involved va_list).
> > 
> > 
> > 
> > ===
> > Mark Millard
> > markmi at dsl-only.net
> 
> _______________________________________________
> freebsd-toolchain at freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-toolchain
> To unsubscribe, send any mail to "freebsd-toolchain-unsubscribe at freebsd.org"