clang gets numerical underflow wrong, please fix.
Steve Kargl
sgk at troutmask.apl.washington.edu
Sun Mar 13 20:10:05 UTC 2016
On Sun, Mar 13, 2016 at 09:03:57PM +0100, Dimitry Andric wrote:
> On 13 Mar 2016, at 19:25, Steve Kargl <sgk at troutmask.apl.washington.edu> wrote:
> >
> > Consider this small piece of code:
> >
> > #include <fenv.h>
> > #include <stdio.h>
> >
> > float
> > foo()
> > {
> > static const volatile float tiny = 1.e-30f;
> > return (tiny * tiny);
> > }
> >
> > int
> > main(void)
> > {
> > float x;
> > feclearexcept(FE_ALL_EXCEPT);
> > x = foo();
> > if (fetestexcept(FE_UNDERFLOW)) printf("FE_UNDERFLOW: ");
> > printf("x = %e\n", x);
> > return 0;
> > }
> >
> > clang seems to get the underflow condition wrong.
> >
> > % cc -o z a.c -lm && ./z
> > FE_UNDERFLOW: x = 0.000000e+00
> >
> > % cc -O -o z a.c -lm && ./z
> > x = 1.000000e-60 <--- This is not a possible value!
> >
> > % gcc -o z a.c -lm && ./z
> > FE_UNDERFLOW: x = 0.000000e+00
> >
> > % gcc -O -o z a.c -lm && ./z
> > FE_UNDERFLOW: x = 0.000000e+00
>
> Hmm, this is an interesting one. On amd64, it works as expected with
> clang, but there it always uses SSE, obviously:
>
> $ ./underflow-amd64
> FE_UNDERFLOW: x = 0.000000e+00
>
> The problem seems to be caused by the intermediate result being stored
> using fstpl instead of fstps, e.g. simplifying the sample program (to
> get rid of all the SSE stuff the fexxx() macros insert):
>
> int main(void)
> {
> float x;
> __uint16_t status;
> __fnclex();
> x = foo();
> __fnstsw(&status);
> printf("status: %#x\n", (unsigned)status);
> printf("x = %e\n", x);
> return 0;
> }
>
> With gcc, the assembly becomes:
>
> foo:
> flds tiny.1853
> flds tiny.1853
> fmulp %st, %st(1)
> ret
> [...]
> main:
> [...]
> fnclex
> call foo
> fstps 12(%esp)
> fnstsw %ax
>
> In this case, fmulp does not generate an underflow, but the fstps will.
> With clang, the assembly becomes:
>
> foo:
> flds foo.tiny
> fmuls foo.tiny
> retl
> [...]
> main:
> subl $24, %esp
> fnclex
> calll foo
> fstpl 12(%esp) # 8-byte Folded Spill
> fnstsw 22(%esp)
>
> So it's storing the intermediate result in a double, for some reason.
> The fnstsw will then result in zero, since there was no underflow at
> that point.
>
> I will submit a bug for this upstream, thanks for the report.
>
Thanks for the quick reply. But, it must be using an 80-bit
extended double instead of a double for storage. This variation
#include <fenv.h>
#include <stdio.h>
int
main(void)
{
int i;
// float x = 1.f;
double x = 1.;
i = 0;
feclearexcept(FE_ALL_EXCEPT);
do {
x /= 2;
i++;
} while(!fetestexcept(FE_UNDERFLOW));
if (fetestexcept(FE_UNDERFLOW)) printf("FE_UNDERFLOW: ");
printf("x = %e after %d iterations\n", x, i);
return 0;
}
yields
% cc -O -o z b.c -lm && ./z
FE_UNDERFLOW: x = 0.000000e+00 after 16435 iterations
It should be 1075 iterations.
Note, there is a similar issue with OVERFLOW. The upshot is
that clang on current is probably miscompiling libm.
--
Steve
More information about the freebsd-toolchain
mailing list