i386 clang optimisation problem with stack alignment

Dimitry Andric dim at FreeBSD.org
Wed Sep 18 21:13:46 UTC 2013


On Sep 10, 2013, at 18:34, Tijl Coosemans <tijl at freebsd.org> wrote:
> On Tue, 10 Sep 2013 18:16:01 +0200 Tijl Coosemans wrote:
>> I've attached a small test program extracted from multimedia/gstreamer-ffmpeg
>> (libavcodec/h264_cabac.c:ff_h264_init_cabac_states(H264Context *h)).
>> 
>> When you compile and run it like this on FreeBSD/i386, it results in a
>> SIGBUS:
>> 
>> % cc -o paddd paddd.c -O3 -msse2 -fPIE -fomit-frame-pointer 
>> % ./paddd
>> Bus error
>> 
>> The reason is this instruction where %esp isn't 16-byte aligned:
>> paddd   (%esp), %xmm7

Hmm, as far as I can see, the problem is related to position independent code, in combination with omitting the frame pointer:

$ cc -o paddd paddd.c -O3 -msse2 -fomit-frame-pointer
$ ./paddd
$ 

$ cc -o paddd paddd.c -O3 -msse2 -fPIE -fomit-frame-pointer
$ ./paddd
Bus error (core dumped)
$ 

$ cc -o paddd paddd.c -O3 -msse2 -fPIE -fno-omit-frame-pointer
$ ./paddd
$ 


>> Is this an upstream bug or is this because of local changes (to make the
>> stack 4 byte aligned by default or something)?

The 4 byte alignment on i386 changes are from upstream, but we initiated them after a bit of discussion (see http://llvm.org/viewvc/llvm-project?view=revision&revision=167632 ).

Note the problem only occurs at -O3, which enables the vectorizer, so there might an issue with it in combination with position independent code generation and omitting frame pointers.  If you check what clang passes to its cc1 stage with your original command line, it gives:

"/usr/bin/cc" -cc1 -triple i386-unknown-freebsd10.0 -emit-obj -disable-free -main-file-name paddd.c -mrelocation-model pic -pic-level 2 -pie-level 2 -masm-verbose -mconstructor-aliases -target-cpu i486 -target-feature +sse2 -v -resource-dir /usr/bin/../lib/clang/3.3 -O3 -fdebug-compilation-dir /home/dim/bugs/paddd -ferror-limit 19 -fmessage-length 130 -mstackrealign -fobjc-runtime=gnustep -fobjc-default-synthesize-properties -fdiagnostics-show-option -fcolor-diagnostics -backend-option -vectorize-loops -o /tmp/paddd-zdRbKM.o -x c paddd.c

So it does pass -mstackrealign, but for some reason it isn't always effective.  For the -fPIE -fomit-frame-pointer case, the prolog for init_states() becomes :

init_states:                            # @init_states
# BB#0:                                 # %vector.ph
        pushl   %ebp
        pushl   %ebx
        pushl   %edi
        pushl   %esi
        subl    $28, %esp
        calll   .L0$pb
.L0$pb:
        popl    %edx

If you remove -fPIE, the data is directly accessed via its (properly 16 byte aligned) symbol, so there is no alignment problem:

        paddd   .LCPI0_0, %xmm7

but the stack is not realigned in the prolog either:

init_states:                            # @init_states
# BB#0:                                 # %vector.ph
        pushl   %ebx
        pushl   %edi
        pushl   %esi
        movd    16(%esp), %xmm0
...

Then, if you use -fPIE, but add -fno-omit-frame-pointer:

init_states:                            # @init_states
# BB#0:                                 # %vector.ph
        pushl   %ebp
        movl    %esp, %ebp
        pushl   %ebx
        pushl   %edi
        pushl   %esi
        andl    $-16, %esp
        subl    $48, %esp
        calll   .L0$pb
.L0$pb:
        popl    %edx
.Ltmp0:

E.g., here the stack is properly realigned, and the function works fine.

In any case: yes, I think this is a bug, and we should report it upstream.  This is a very nice test case to do so.

-Dimitry

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 203 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://lists.freebsd.org/pipermail/freebsd-toolchain/attachments/20130918/3c7acada/attachment.sig>


More information about the freebsd-toolchain mailing list