clang gets numerical underflow wrong, please fix.
Dimitry Andric
dim at FreeBSD.org
Mon Mar 14 00:02:32 UTC 2016
On 13 Mar 2016, at 21:10, Steve Kargl <sgk at troutmask.apl.washington.edu> wrote:
> On Sun, Mar 13, 2016 at 09:03:57PM +0100, Dimitry Andric wrote:
...
>> So it's storing the intermediate result in a double, for some reason.
>> The fnstsw will then result in zero, since there was no underflow at
>> that point.
>>
>> I will submit a bug for this upstream, thanks for the report.
Submitted upstream as: https://llvm.org/bugs/show_bug.cgi?id=26931
> Thanks for the quick reply. But, it must be using an 80-bit
> extended double instead of a double for storage. This variation
>
> #include <fenv.h>
> #include <stdio.h>
>
> int
> main(void)
> {
> int i;
> // float x = 1.f;
> double x = 1.;
> i = 0;
> feclearexcept(FE_ALL_EXCEPT);
> do {
> x /= 2;
> i++;
> } while(!fetestexcept(FE_UNDERFLOW));
> if (fetestexcept(FE_UNDERFLOW)) printf("FE_UNDERFLOW: ");
> printf("x = %e after %d iterations\n", x, i);
>
> return 0;
> }
>
> yields
>
> % cc -O -o z b.c -lm && ./z
> FE_UNDERFLOW: x = 0.000000e+00 after 16435 iterations
>
> It should be 1075 iterations.
>
> Note, there is a similar issue with OVERFLOW. The upshot is
> that clang on current is probably miscompiling libm.
With this example, I also get different results from gcc (4.8.5),
depending on the optimization level:
$ gcc -O underflow-iter.c -o underflow-iter-gcc -lm
$ ./underflow-iter-gcc
FE_UNDERFLOW: x = 0.000000e+00 after 1075 iterations
$ gcc -O2 underflow-iter.c -o underflow-iter-gcc -lm
$ ./underflow-iter-gcc
FE_UNDERFLOW: x = 0.000000e+00 after 16435 iterations
Similar for the overflow case:
$ gcc -O overflow-iter.c -o overflow-iter-gcc -lm
$ ./overflow-iter-gcc
FE_OVERFLOW: x = inf after 1024 iterations
$ gcc -O2 overflow-iter.c -o overflow-iter-gcc -lm
$ ./overflow-iter-gcc
FE_OVERFLOW: x = inf after 16384 iterations
Are we depending on some sort of subtle undefined behavior here? With
-O, the 'main loop' becomes:
.L3:
fld1
fstpl 24(%esp)
movl $0, %ebx
.L8:
fldl 24(%esp)
fld %st(0)
faddp %st, %st(1)
fstpl 24(%esp)
addl $1, %ebx
fnstsw %ax
movl %eax, %esi
movl __has_sse, %eax
testl %eax, %eax
je .L4
cmpl $2, %eax
jne .L5
call __test_sse
testl %eax, %eax
je .L5
.L4:
stmxcsr 44(%esp)
jmp .L6
.L5:
movl $0, 44(%esp)
.L6:
orl 44(%esp), %esi
testl $8, %esi
je .L8
With -O2, it becomes:
.L3:
fld1
xorl %ebx, %ebx
.L12:
fadd %st(0), %st
addl $1, %ebx
fnstsw %ax
testl %edx, %edx
movl %eax, %esi
je .L10
cmpl $2, %edx
je .L27
.L9:
xorl %eax, %eax
.L8:
orl %eax, %esi
andl $8, %esi
je .L12
So it switches from using faddp and fstpl to direct fadd of %st(0) and
%st. I assume that uses the internal 80 bit precision? Gcc also
manages to move the __has_sse stuff out to further down in the function,
but it does not really affect the result.
-Dimitry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 194 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://lists.freebsd.org/pipermail/freebsd-toolchain/attachments/20160314/d710b886/attachment.sig>
More information about the freebsd-toolchain
mailing list