CFLAGS+= -fPIC per default?

Sun Feb 22 22:18:26 PST 2004

Peter Wemm wrote:

>On Sunday 22 February 2004 01:27 pm, Joseph Fenton wrote:
>  
>
>>>>Adding CFLAGS= -fPIC to /etc/make.conf may be a local solution but
>>>>are there any drawbacks by adding something like
>>>>.if ${ARCH} == "amd64"
>>>>CFLAGS+= -fPIC
>>>>.endif
>>>>
>>>>to ports/Mk/bsd.port.mk?
>>>>        
>>>>
>>>No.. please don't.  Although the AMD64 platform supports PIC
>>>addressing modes directly, it is still a penalty.  (Although
>>>thankfully, its nowhere near as expensive as it is on i386!)
>>>
>>>For example, in libc when built in PIC mode:
>>>#ifdef PIC
>>>       movq    PIC_GOT(HIDENAME(curbrk)),%rdx
>>>       movq    (%rdx),%rax
>>>#else
>>>       movq    HIDENAME(curbrk)(%rip),%rax
>>>#endif
>>>
>>>The problem is that we can't be sure that everything will be in +/-
>>>31 bit offsets of each other.  This means that PIC objects have to
>>>do indirect memory references that aren't required in no-pic mode.
>>>
>>>I386 also loses a general purpose register (%ebx) which is why -fpic
>>>is more expensive there.  But even though we don't lose a register,
>>>its still a cost because of the extra global-offset-table memory
>>>references.
>>>
>>>Footnote: you just made me wonder about some of these ifdefs..  We
>>>shouldn't need them for intra-object references like this.  I'll
>>>have to go and look again.
>>>      
>>>
>>Sorry to be anal, but PC-relative addressing is by definition
>>position-independent code. Who was the bright individual
>>who decided that when compiling PIC code to NOT use
>>PC-relative and to NOT use PC-relative for non-PIC code?
>>    
>>
>
>Recall the last paragraph you just quoted.  I already said I thought the 
>code wasn't quite right.  However, I just remembered why its done that 
>way.
>
>Remember.. unix link semantics have interesting symbol override effects.  
>Although you might normally be jumping within the same library and can 
>trivially use %rip-relative addressing, if the main program overrides 
>libc symbols, we must use those instead.  Thus, we can't use 
>%rip-relative ways to access them because we can't be sure its going to 
>be within +/- 2GB.  In fact, its guaranteed to not be the case for 
>dynamic linking on FreeBSD/amd64 because the default load address for 
>shared libs is around the 8GB mark.  For static linking though, we 
>don't usually have this same 7.9GB hole in our symbol space.
>
>Also.. when compiling with -fpic, you don't know whether you're linking 
>pc-relative code into an application or into a shared library that 
>could be loaded just about anywhere.
>
>  
>
>>This is counter-intuitive. For PIC code, you use PC-relative
>>addressing in two cases: 1 - the code is guaranteed to be
>>a constant distance apart, like code in the same section; 2 -
>>when the loader guarantees the relative position of different
>>sections, like code and data contained in a ROM.
>>
>>Case 1 could be violated by the code being too far apart
>>for PC-relative addressing. This is virtually impossible for
>>the AMD64 as I doubt we'll see code exceeding 2G in
>>size in the next several decades. Code is only now exceeding
>>a few megabytes. Case 2 is usually your problem, which leads
>>to tables used to hold addresses or offsets.
>>    
>>
>
>Case 1 is violated by symbol overrides by the main program.
>
>  
>
>>Both sides of the #ifdef PIC are doing valid PIC code.
>>PC-relative addressing should be used wherever possible
>>unless it incurs a speed penalty.
>>    
>>
>
>gcc generally generates %rip-relative offsets where possible even 
>without -fpic.
>
>  
>
>>Non-PIC code generally does PC-relative code if it
>>is faster and is legal, for example, when referring to
>>code within the same section. When the address must
>>be set by the loader for non-PIC code, it seems to me
>>that the fastest code would be like this:
>>
>>  mov     <imm32>,%rdx
>>  movq    (%rdx),%rax
>>    
>>
>
>Guess what.. look at the original code:
>   movq    PIC_GOT(HIDENAME(curbrk)),%rdx
>   movq    (%rdx),%rax
>The first instruction just happens to be of the form 'mov <imm32>,%rdx. 
>
>  
>
>>or if the address is > 4G
>>
>>  movq    <imm64>,%rdx
>>  movq    (%rdx),%rax
>>    
>>
>
>Except that there is only one movq <imm64> instruction, and it only 
>works with %rax as a target, and its not particularly fast.  Since 
>you're guaranteed to have an offset table within +/- 2GB, you may as 
>well use it.
>
>  
>
>>The loader would then set the immediate vector upon
>>loading the sections. This avoids a memory hit for accessing
>>a table of addresses while only adding at most 5 bytes to the
>>size of the code. I would probably use this unless the user
>>is compiling with flags set to compile with minimized code
>>size.
>>    
>>
>
>Also remember that this is in libc, where its not a user code size 
>compile option.  We have to cope with whatever environment we find 
>outselves loaded into.  We have to assume the worst case scenario.
>
>Incidently, for an example of what GCC does...  given this program:
>extern int j;
>extern int foo(int i);
>int
>bar(int i)
>{
>        return foo(i) + 10 + j;
>}
>cc -S -O   produces:
>bar:
>        subq    $8, %rsp
>        call    foo
>        addl    j(%rip), %eax
>        addl    $10, %eax
>        addq    $8, %rsp
>        ret
>
>cc -S -O -fPIC produces:
>bar:
>        subq    $8, %rsp
>        call    foo at PLT
>        movq    j at GOTPCREL(%rip), %rdx
>        addl    (%rdx), %eax
>        addl    $10, %eax
>        addq    $8, %rsp
>        ret
>
>Note how the -fpic case is less efficient.  Specifically, function calls 
>are trampolined via the local object's procedure linkage table rather 
>than just calling them directly.. because we dont know if they're 
>within +/- 2GB or not.  Or if they're even in the same object.
>Secondly, it uses the global offset table to find the address of 'j' and 
>then indirectly references it as a two-step sequence.  The non-pic case 
>just makes a pc-relative reference in a single instruction. 
>
>  
>
>>Sorry to nit-pick like this, but having worked on both Mac
>>and Amiga ROMs, PIC mode under BSD really seems
>>backwards to me.
>>    
>>
>
>Unix library semantics are very very different to ROM semantics.  I've 
>been there too.
>
>Also, this isn't BSD-specific.  It's ELF specific and thats what the 
>toolchain produces and expects.  We use the same toolchain that linux 
>does.
>  
>
Okay, that made a lot more sense than the original post. Sorry about the 
whole thing.
It is rather different, but the above clears a lot of the confusion. 
Thanks a bunch!

Your example function in the two cases put it in terms I have dealt with 
on PowerMacs,
so that was a good demonstration.