Bit twiddling question
Steve Kargl
sgk at troutmask.apl.washington.edu
Fri Mar 10 07:21:43 UTC 2017
On Fri, Mar 10, 2017 at 05:46:34PM +1100, Bruce Evans wrote:
> On Thu, 9 Mar 2017, Steve Kargl wrote:
>
> > On Fri, Mar 10, 2017 at 05:02:13AM +1100, Bruce Evans wrote:
> >> On Thu, 9 Mar 2017, Steve Kargl wrote:
> >>
> >>> On Wed, Mar 08, 2017 at 11:52:36PM -0800, Steve Kargl wrote:
> >>>> To give a hint at what I have been working on, I have implementations
> >>>> for sinpif(x), sinpi(x), cospif(x), and cospi(x). For sinpif(x) and
> >>>> cospif(x) I have kernels that give correctly rounded results for
> >>>> FLT_MIN <= x < 0x1p23 (at least on x86_64). I also have slightly
> >>
> >> I didn't see the original mail.
> >
> > Odd. Maybe munched by a spam filter.
>
> It was delivered out of order.
>
> >> I'd prefer you to finish the long double versions of cosl(), sinl() and
> >> tanl() (maybe das is responsible for the latter). These are still too
> >> slow and minor bugs for at least sinl() on small x, partly by not copying
> >> the lower-precision versions enough.
> >
> > sinl, cosl, and tanl are good enough for my purposes at the moment.
> > tgammal and powl are higher on the list of things to fix. But,
>
> Only powl is very important, but everyone avoids it because it is harder.
I have a partially written powl in my tree. I'll probably finish
it someday.
>
> >* ...
> >>> which can be written (off the top-of-head and not tested)
> >>> if (ix < 0x43300000) { /* 1 <= |x| < 0x1p52 */
> >>> double volatile vx;
> >>> uint32_t n;
> >>> vx = ax + 0x1p52;
> >>> vx = vx - 0x1p52;
> >>> if (vx == ax) return(0);
> >>> if (vx > UINT32_MAX) { /* ax = m + n + r with m + n */
> >>> vx -= UINT32_MAX; /* m = UINT32_MAX. n is in range */
> >>> ax -= UINT32_MAX;
> >>> }
> >>> n = (uint32_t)vx;
> >>> ax = __compute_sinpi(ax - n);
> >>> if (n & 1) ax = -ax;
> >>> return ((hx & 0x80000000) ? -ax : ax);
> >>> }
> >>>
> >>> Something similar can be applied to ld128, but reduction may take
> >>> two rounds (ie., a comparison with UINT64_MAX and then UINT32_MAX).
> >>
> >> I don't like that much. The 0x1p52 magic is tricky and probably buggy.
> >> s_rint.c uses the same magic but needs many complications to avoid
> >> double rounding.
> >
> > Yep. Seems to have some issue. From my working copy of s_sinpif(x),
> > I tried
> > ...
> > My max ULP went from 0.5 to 0.50398505 for exhaustive testing
> > in the interval [1,100]. If I grab the worse case x and use the
> > '#if 1' code, I see
> >
> >> Similar-looking magic for trig reduction in radians
> >> uses floating the magic value of 0x1.8p52 to bias rounding errors in
> >> a safe way, and arranges so that rounding errors don't matter then.
> >> ....
> >> y = x * 0.5 - floor(x * 0.5); /* 4 quads in [2n, 2n+2) -> [0, 1) */
> >> y = 4 * y; /* -> [0, 4) */
> >> int n = y; /* quadrant */
> >> r = (y - n) / 4; /* offset in quadrant */
> >>
> >> This is a bit slow, and too much like lgamma's method.
> >
> > I did look over what you did in lgamma[f], but forgot the logic
> > behind the reduction into octants. I suppose that I'm more
> > concern with 'make it work', 'make it correct', and then I
> > will/might worry about 'make it fast'. But, I don't want it to be
> > too slow, so I'm trying to avoid function calls and reclassification
> > of x as +-inf or nan. Heck, r = remquof(ax, vx, &n) should work,
> > but there's that speed issue.
>
> In fact, we already did the work to optimize this using the 0x1p52
> magic for lgamma, and even committed it.
>
> fdlibm lgamma used a messy combination of multiplications by 0.5 and
> 2.0, 2 floor()s, 1 half of the 0x1p52 magic and bit fiddling to do
> the other half of the 0x1p52 magic. You simplified this, and we
> eventually optimized it to use no floors (but the equivalent of
> 2 rint()s on |x| and 4*|x| implemented using full 0x1p52 magic) and
> different bit fiddling. Now I fear that the 0x1p52 doesn't work
> in double rounding cases.
At least, in my sinpif() the double rounding appears near x = n.5
where I get n+1 instead of the intended n. I'll need to recheck
lgammaf, but I recall that we are working with |x|. This means
we can test for n < |x|. If we get n+1 then double rounding
occurred.
> Hopefully it works like it does for trig
> functions -- at worst there is an off-by 1 error in the quadrant
> but here is a compensating tiny error in the offset. E.g., the
> closest value to pi/4 might be represented as octant 0 with offset
> pi/4 + epsilon or octant 1 with offset epsilon (the lowest level
> must reduce to octants, not quadrants). Then the poly much just
> approximate a little above pi/4 in case epsilon > 0. Higher levels
> also see the off-by-1 error but everything is consistent.
>
> This explains your larger error -- the accuracy of the poly breaks
> down not far above pi/4, and even epsilon above pi/4 there may be
> a near-halfway case that is relatively inaccurate.
>
Well, I stared at s_floorf.c and s_floor.c today. For sinpi,
I'm interested in 1 <= x < 0x1p52 (or 23 or 63 or 112 for other
precisions), which significantly reduces the needed bit twiddling.
For double, I need to do
if (ix < 0x43300000) { /* 1 <= |x| < 0x1p52 */
/* Reduced from s_floor.c from here ... */
j0 = ((ix >> 20) & 0x7ff) - 0x3ff;
if (j0 < 20) {
ix &= ~(0x000fffff >> j0);
lx = 0;
} else {
i = ((uint32_t)(0xffffffff)) >> (j0 - 20);
lx &= ~i;
}
INSERT_WORDS(x,ix,lx);
/* ... to here. */
ax -= x; /* r = |x| - floor(|n|) */
if (ax == 0) {
ax = zero; /* Make sure sign of zero is +. */
} else {
ax = __compute_sinpi(ax);
/* Here I determine if integer part of ax is even or odd. */
if (x > 0x1p30) x -= 0x1p30;
lx = (uint32_t)x;
if (lx & 1) ax = -ax;
}
return ((hx & 0x80000000) ? -ax : ax);
}
--
Steve
20161221 https://www.youtube.com/watch?v=IbCHE-hONow
More information about the freebsd-numerics
mailing list