clang gets numerical underflow wrong, please fix.

Mon Mar 14 00:02:32 UTC 2016

On 13 Mar 2016, at 21:10, Steve Kargl <sgk at troutmask.apl.washington.edu> wrote:
> On Sun, Mar 13, 2016 at 09:03:57PM +0100, Dimitry Andric wrote:
...
>> So it's storing the intermediate result in a double, for some reason.
>> The fnstsw will then result in zero, since there was no underflow at
>> that point.
>> 
>> I will submit a bug for this upstream, thanks for the report.

Submitted upstream as: https://llvm.org/bugs/show_bug.cgi?id=26931

> Thanks for the quick reply.  But, it must be using an 80-bit
> extended double instead of a double for storage.  This variation
> 
> #include <fenv.h>
> #include <stdio.h>
> 
> int
> main(void)
> {
>   int i;
> //   float x = 1.f;
>   double x = 1.;
>   i = 0;
>   feclearexcept(FE_ALL_EXCEPT);
>   do {
>      x /= 2;
>      i++;
>   } while(!fetestexcept(FE_UNDERFLOW));
>   if (fetestexcept(FE_UNDERFLOW)) printf("FE_UNDERFLOW: ");
>   printf("x = %e after %d iterations\n", x, i);
> 
>   return 0;
> }
> 
> yields
> 
> % cc -O -o z b.c -lm && ./z
> FE_UNDERFLOW: x = 0.000000e+00 after 16435 iterations
> 
> It should be 1075 iterations.
> 
> Note, there is a similar issue with OVERFLOW.  The upshot is
> that clang on current is probably miscompiling libm.

With this example, I also get different results from gcc (4.8.5),
depending on the optimization level:

$ gcc -O underflow-iter.c -o underflow-iter-gcc -lm
$ ./underflow-iter-gcc
FE_UNDERFLOW: x = 0.000000e+00 after 1075 iterations
$ gcc -O2 underflow-iter.c -o underflow-iter-gcc -lm
$ ./underflow-iter-gcc
FE_UNDERFLOW: x = 0.000000e+00 after 16435 iterations

Similar for the overflow case:

$ gcc -O overflow-iter.c -o overflow-iter-gcc -lm
$ ./overflow-iter-gcc
FE_OVERFLOW: x = inf after 1024 iterations
$ gcc -O2 overflow-iter.c -o overflow-iter-gcc -lm
$ ./overflow-iter-gcc
FE_OVERFLOW: x = inf after 16384 iterations

Are we depending on some sort of subtle undefined behavior here?  With
-O, the 'main loop' becomes:

.L3:
	fld1
	fstpl	24(%esp)
	movl	$0, %ebx
.L8:
	fldl	24(%esp)
	fld	%st(0)
	faddp	%st, %st(1)
	fstpl	24(%esp)
	addl	$1, %ebx
	fnstsw %ax
	movl	%eax, %esi
	movl	__has_sse, %eax
	testl	%eax, %eax
	je	.L4
	cmpl	$2, %eax
	jne	.L5
	call	__test_sse
	testl	%eax, %eax
	je	.L5
.L4:
	stmxcsr 44(%esp)
	jmp	.L6
.L5:
	movl	$0, 44(%esp)
.L6:
	orl	44(%esp), %esi
	testl	$8, %esi
	je	.L8

With -O2, it becomes:

.L3:
	fld1
	xorl	%ebx, %ebx
.L12:
	fadd	%st(0), %st
	addl	$1, %ebx
	fnstsw %ax
	testl	%edx, %edx
	movl	%eax, %esi
	je	.L10
	cmpl	$2, %edx
	je	.L27
.L9:
	xorl	%eax, %eax
.L8:
	orl	%eax, %esi
	andl	$8, %esi
	je	.L12

So it switches from using faddp and fstpl to direct fadd of %st(0) and
%st.  I assume that uses the internal 80 bit precision?  Gcc also
manages to move the __has_sse stuff out to further down in the function,
but it does not really affect the result.

-Dimitry

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 194 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://lists.freebsd.org/pipermail/freebsd-toolchain/attachments/20160314/d710b886/attachment.sig>