CFLAGS+= -fPIC per default?
Joseph Fenton
jlfenton at citlink.net
Sun Feb 22 22:18:26 PST 2004
Peter Wemm wrote:
>On Sunday 22 February 2004 01:27 pm, Joseph Fenton wrote:
>
>
>>>>Adding CFLAGS= -fPIC to /etc/make.conf may be a local solution but
>>>>are there any drawbacks by adding something like
>>>>.if ${ARCH} == "amd64"
>>>>CFLAGS+= -fPIC
>>>>.endif
>>>>
>>>>to ports/Mk/bsd.port.mk?
>>>>
>>>>
>>>No.. please don't. Although the AMD64 platform supports PIC
>>>addressing modes directly, it is still a penalty. (Although
>>>thankfully, its nowhere near as expensive as it is on i386!)
>>>
>>>For example, in libc when built in PIC mode:
>>>#ifdef PIC
>>> movq PIC_GOT(HIDENAME(curbrk)),%rdx
>>> movq (%rdx),%rax
>>>#else
>>> movq HIDENAME(curbrk)(%rip),%rax
>>>#endif
>>>
>>>The problem is that we can't be sure that everything will be in +/-
>>>31 bit offsets of each other. This means that PIC objects have to
>>>do indirect memory references that aren't required in no-pic mode.
>>>
>>>I386 also loses a general purpose register (%ebx) which is why -fpic
>>>is more expensive there. But even though we don't lose a register,
>>>its still a cost because of the extra global-offset-table memory
>>>references.
>>>
>>>Footnote: you just made me wonder about some of these ifdefs.. We
>>>shouldn't need them for intra-object references like this. I'll
>>>have to go and look again.
>>>
>>>
>>Sorry to be anal, but PC-relative addressing is by definition
>>position-independent code. Who was the bright individual
>>who decided that when compiling PIC code to NOT use
>>PC-relative and to NOT use PC-relative for non-PIC code?
>>
>>
>
>Recall the last paragraph you just quoted. I already said I thought the
>code wasn't quite right. However, I just remembered why its done that
>way.
>
>Remember.. unix link semantics have interesting symbol override effects.
>Although you might normally be jumping within the same library and can
>trivially use %rip-relative addressing, if the main program overrides
>libc symbols, we must use those instead. Thus, we can't use
>%rip-relative ways to access them because we can't be sure its going to
>be within +/- 2GB. In fact, its guaranteed to not be the case for
>dynamic linking on FreeBSD/amd64 because the default load address for
>shared libs is around the 8GB mark. For static linking though, we
>don't usually have this same 7.9GB hole in our symbol space.
>
>Also.. when compiling with -fpic, you don't know whether you're linking
>pc-relative code into an application or into a shared library that
>could be loaded just about anywhere.
>
>
>
>>This is counter-intuitive. For PIC code, you use PC-relative
>>addressing in two cases: 1 - the code is guaranteed to be
>>a constant distance apart, like code in the same section; 2 -
>>when the loader guarantees the relative position of different
>>sections, like code and data contained in a ROM.
>>
>>Case 1 could be violated by the code being too far apart
>>for PC-relative addressing. This is virtually impossible for
>>the AMD64 as I doubt we'll see code exceeding 2G in
>>size in the next several decades. Code is only now exceeding
>>a few megabytes. Case 2 is usually your problem, which leads
>>to tables used to hold addresses or offsets.
>>
>>
>
>Case 1 is violated by symbol overrides by the main program.
>
>
>
>>Both sides of the #ifdef PIC are doing valid PIC code.
>>PC-relative addressing should be used wherever possible
>>unless it incurs a speed penalty.
>>
>>
>
>gcc generally generates %rip-relative offsets where possible even
>without -fpic.
>
>
>
>>Non-PIC code generally does PC-relative code if it
>>is faster and is legal, for example, when referring to
>>code within the same section. When the address must
>>be set by the loader for non-PIC code, it seems to me
>>that the fastest code would be like this:
>>
>> mov <imm32>,%rdx
>> movq (%rdx),%rax
>>
>>
>
>Guess what.. look at the original code:
> movq PIC_GOT(HIDENAME(curbrk)),%rdx
> movq (%rdx),%rax
>The first instruction just happens to be of the form 'mov <imm32>,%rdx.
>
>
>
>>or if the address is > 4G
>>
>> movq <imm64>,%rdx
>> movq (%rdx),%rax
>>
>>
>
>Except that there is only one movq <imm64> instruction, and it only
>works with %rax as a target, and its not particularly fast. Since
>you're guaranteed to have an offset table within +/- 2GB, you may as
>well use it.
>
>
>
>>The loader would then set the immediate vector upon
>>loading the sections. This avoids a memory hit for accessing
>>a table of addresses while only adding at most 5 bytes to the
>>size of the code. I would probably use this unless the user
>>is compiling with flags set to compile with minimized code
>>size.
>>
>>
>
>Also remember that this is in libc, where its not a user code size
>compile option. We have to cope with whatever environment we find
>outselves loaded into. We have to assume the worst case scenario.
>
>Incidently, for an example of what GCC does... given this program:
>extern int j;
>extern int foo(int i);
>int
>bar(int i)
>{
> return foo(i) + 10 + j;
>}
>cc -S -O produces:
>bar:
> subq $8, %rsp
> call foo
> addl j(%rip), %eax
> addl $10, %eax
> addq $8, %rsp
> ret
>
>cc -S -O -fPIC produces:
>bar:
> subq $8, %rsp
> call foo at PLT
> movq j at GOTPCREL(%rip), %rdx
> addl (%rdx), %eax
> addl $10, %eax
> addq $8, %rsp
> ret
>
>Note how the -fpic case is less efficient. Specifically, function calls
>are trampolined via the local object's procedure linkage table rather
>than just calling them directly.. because we dont know if they're
>within +/- 2GB or not. Or if they're even in the same object.
>Secondly, it uses the global offset table to find the address of 'j' and
>then indirectly references it as a two-step sequence. The non-pic case
>just makes a pc-relative reference in a single instruction.
>
>
>
>>Sorry to nit-pick like this, but having worked on both Mac
>>and Amiga ROMs, PIC mode under BSD really seems
>>backwards to me.
>>
>>
>
>Unix library semantics are very very different to ROM semantics. I've
>been there too.
>
>Also, this isn't BSD-specific. It's ELF specific and thats what the
>toolchain produces and expects. We use the same toolchain that linux
>does.
>
>
Okay, that made a lot more sense than the original post. Sorry about the
whole thing.
It is rather different, but the above clears a lot of the confusion.
Thanks a bunch!
Your example function in the two cases put it in terms I have dealt with
on PowerMacs,
so that was a good demonstration.
More information about the freebsd-amd64
mailing list