clang gets numerical underflow wrong, please fix.

Sun Mar 13 20:10:05 UTC 2016

On Sun, Mar 13, 2016 at 09:03:57PM +0100, Dimitry Andric wrote:
> On 13 Mar 2016, at 19:25, Steve Kargl <sgk at troutmask.apl.washington.edu> wrote:
> > 
> > Consider this small piece of code:
> > 
> > #include <fenv.h>
> > #include <stdio.h>
> > 
> > float
> > foo()
> > {
> > 	static const volatile float tiny = 1.e-30f;
> > 	return (tiny * tiny);
> > }
> > 
> > int
> > main(void)
> > {
> >   float x;
> >   feclearexcept(FE_ALL_EXCEPT);
> >   x = foo();
> >   if (fetestexcept(FE_UNDERFLOW)) printf("FE_UNDERFLOW: ");
> >   printf("x = %e\n", x);
> >   return 0;
> > }
> > 
> > clang seems to get the underflow condition wrong.
> > 
> > % cc -o z a.c -lm && ./z
> > FE_UNDERFLOW: x = 0.000000e+00
> > 
> > % cc -O -o z a.c -lm && ./z
> > x = 1.000000e-60             <--- This is not a possible value!
> > 
> > % gcc -o z a.c -lm && ./z
> > FE_UNDERFLOW: x = 0.000000e+00
> > 
> > % gcc -O -o z a.c -lm && ./z
> > FE_UNDERFLOW: x = 0.000000e+00
> 
> Hmm, this is an interesting one.  On amd64, it works as expected with
> clang, but there it always uses SSE, obviously:
> 
> $ ./underflow-amd64
> FE_UNDERFLOW: x = 0.000000e+00
> 
> The problem seems to be caused by the intermediate result being stored
> using fstpl instead of fstps, e.g. simplifying the sample program (to
> get rid of all the SSE stuff the fexxx() macros insert):
> 
> int main(void)
> {
>   float x;
>   __uint16_t status;
>   __fnclex();
>   x = foo();
>   __fnstsw(&status);
>   printf("status: %#x\n", (unsigned)status);
>   printf("x = %e\n", x);
>   return 0;
> }
> 
> With gcc, the assembly becomes:
> 
> foo:
>         flds    tiny.1853
>         flds    tiny.1853
>         fmulp   %st, %st(1)
>         ret
> [...]
> main:
> [...]
>         fnclex
>         call    foo
>         fstps   12(%esp)
>         fnstsw %ax
> 
> In this case, fmulp does not generate an underflow, but the fstps will.
> With clang, the assembly becomes:
> 
> foo:
>         flds    foo.tiny
>         fmuls   foo.tiny
>         retl
> [...]
> main:
>         subl    $24, %esp
>         fnclex
>         calll   foo
>         fstpl   12(%esp)                # 8-byte Folded Spill
>         fnstsw  22(%esp)
> 
> So it's storing the intermediate result in a double, for some reason.
> The fnstsw will then result in zero, since there was no underflow at
> that point.
> 
> I will submit a bug for this upstream, thanks for the report.
> 

Thanks for the quick reply.  But, it must be using an 80-bit
extended double instead of a double for storage.  This variation

#include <fenv.h>
#include <stdio.h>

int
main(void)
{
   int i;
//   float x = 1.f;
   double x = 1.;
   i = 0;
   feclearexcept(FE_ALL_EXCEPT);
   do {
      x /= 2;
      i++;
   } while(!fetestexcept(FE_UNDERFLOW));
   if (fetestexcept(FE_UNDERFLOW)) printf("FE_UNDERFLOW: ");
   printf("x = %e after %d iterations\n", x, i);

   return 0;
}

yields

% cc -O -o z b.c -lm && ./z
FE_UNDERFLOW: x = 0.000000e+00 after 16435 iterations

It should be 1075 iterations.

Note, there is a similar issue with OVERFLOW.  The upshot is
that clang on current is probably miscompiling libm.
-- 
Steve