Bit twiddling question

Fri Mar 10 07:21:43 UTC 2017

On Fri, Mar 10, 2017 at 05:46:34PM +1100, Bruce Evans wrote:
> On Thu, 9 Mar 2017, Steve Kargl wrote:
> 
> > On Fri, Mar 10, 2017 at 05:02:13AM +1100, Bruce Evans wrote:
> >> On Thu, 9 Mar 2017, Steve Kargl wrote:
> >>
> >>> On Wed, Mar 08, 2017 at 11:52:36PM -0800, Steve Kargl wrote:
> >>>> To give a hint at what I have been working on, I have implementations
> >>>> for sinpif(x), sinpi(x), cospif(x), and cospi(x).  For sinpif(x) and
> >>>> cospif(x) I have kernels that give correctly rounded results for
> >>>> FLT_MIN <= x < 0x1p23 (at least on x86_64).   I also have slightly
> >>
> >> I didn't see the original mail.
> >
> > Odd.  Maybe munched by a spam filter.
> 
> It was delivered out of order.
> 
> >> I'd prefer you to finish the long double versions of cosl(), sinl() and
> >> tanl() (maybe das is responsible for the latter).  These are still too
> >> slow and minor bugs for at least sinl() on small x, partly by not copying
> >> the lower-precision versions enough.
> >
> > sinl, cosl, and tanl are good enough for my purposes at the moment.
> > tgammal and powl are higher on the list of things to fix.  But,
> 
> Only powl is very important, but everyone avoids it because it is harder.

I have a partially written powl in my tree.  I'll probably finish
it someday.

> 
> >*  ...
> >>> which can be written (off the top-of-head and not tested)
> >>>        if (ix < 0x43300000) {          /* 1 <= |x| < 0x1p52 */
> >>>                double volatile vx;
> >>>                uint32_t n;
> >>>                vx = ax + 0x1p52;
> >>>                vx = vx - 0x1p52;
> >>>                if (vx == ax) return(0);
> >>>                if (vx > UINT32_MAX) {   /* ax = m + n + r with m + n */
> >>>                    vx -= UINT32_MAX;    /* m = UINT32_MAX.  n is in range */
> >>>                    ax -= UINT32_MAX;
> >>> 				}
> >>>                n = (uint32_t)vx;
> >>>                ax = __compute_sinpi(ax - n);
> >>>                if (n & 1) ax = -ax;
> >>>                return ((hx & 0x80000000) ? -ax : ax);
> >>>        }
> >>>
> >>> Something similar can be applied to ld128, but reduction may take
> >>> two rounds (ie., a comparison with UINT64_MAX and then UINT32_MAX).
> >>
> >> I don't like that much.  The 0x1p52 magic is tricky and probably buggy.
> >> s_rint.c uses the same magic but needs many complications to avoid
> >> double rounding.
> >
> > Yep. Seems to have some issue.  From my working copy of s_sinpif(x),
> > I tried
> > ...
> > My max ULP went from 0.5 to 0.50398505 for exhaustive testing
> > in the interval [1,100].  If I grab the worse case x and use the
> > '#if 1' code, I see
> >
> >> Similar-looking magic for trig reduction in radians
> >> uses floating the magic value of 0x1.8p52 to bias rounding errors in
> >> a safe way, and arranges so that rounding errors don't matter then.
> >> ....
> >>  	y = x * 0.5 - floor(x * 0.5);	/* 4 quads in [2n, 2n+2) -> [0, 1) */
> >>  	y = 4 * y;			/* -> [0, 4) */
> >>  	int n = y;			/* quadrant */
> >>  	r = (y - n) / 4;		/* offset in quadrant */
> >>
> >> This is a bit slow, and too much like lgamma's method.
> >
> > I did look over what you did in lgamma[f], but forgot the logic
> > behind the reduction into octants.  I suppose that I'm more
> > concern with 'make it work', 'make it correct', and then I
> > will/might worry about 'make it fast'.  But, I don't want it to be
> > too slow, so I'm trying to avoid function calls and reclassification
> > of x as +-inf or nan.  Heck, r = remquof(ax, vx, &n) should work,
> > but there's that speed issue.
> 
> In fact, we already did the work to optimize this using the 0x1p52
> magic for lgamma, and even committed it.
> 
> fdlibm lgamma used a messy combination of multiplications by 0.5 and
> 2.0, 2 floor()s, 1 half of the 0x1p52 magic and bit fiddling to do
> the other half of the 0x1p52 magic.  You simplified this, and we
> eventually optimized it to use no floors (but the equivalent of
> 2 rint()s on |x| and 4*|x| implemented using full 0x1p52 magic) and
> different bit fiddling.  Now I fear that the 0x1p52 doesn't work
> in double rounding cases.

At least, in my sinpif() the double rounding appears near x = n.5 
where I get n+1 instead of the intended n.  I'll need to recheck
lgammaf, but I recall that we are working with |x|.  This means
we can test for n < |x|.  If we get n+1 then double rounding
occurred. 

> Hopefully it works like it does for trig
> functions -- at worst there is an off-by 1 error in the quadrant
> but here is a compensating tiny error in the offset.  E.g., the
> closest value to pi/4 might be represented as octant 0 with offset
> pi/4 + epsilon or octant 1 with offset epsilon (the lowest level
> must reduce to octants, not quadrants).  Then the poly much just
> approximate a little above pi/4 in case epsilon > 0.  Higher levels
> also see the off-by-1 error but everything is consistent.
> 
> This explains your larger error -- the accuracy of the poly breaks
> down not far above pi/4, and even epsilon above pi/4 there may be
> a near-halfway case that is relatively inaccurate.
> 

Well, I stared at s_floorf.c and s_floor.c today.  For sinpi,
I'm interested in 1 <= x < 0x1p52 (or 23 or 63 or 112 for other
precisions), which significantly reduces the needed bit twiddling.
For double, I need to do

        if (ix < 0x43300000) {          /* 1 <= |x| < 0x1p52 */
/* Reduced from s_floor.c from here ... */
                j0 = ((ix >> 20) & 0x7ff) - 0x3ff;
                if (j0 < 20) {
                        ix &= ~(0x000fffff >> j0);
                        lx = 0;
                } else {
                        i = ((uint32_t)(0xffffffff)) >> (j0 - 20);
                        lx &= ~i;
                }
                INSERT_WORDS(x,ix,lx);
/* ... to here. */
                ax -= x;        /* r = |x| - floor(|n|) */
                if (ax == 0) {
                        ax = zero;  /* Make sure sign of zero is +. */
                } else {
                        ax = __compute_sinpi(ax);
/* Here I determine if integer part of ax is even or odd. */
                        if (x > 0x1p30) x -= 0x1p30;
                        lx = (uint32_t)x;
                        if (lx & 1) ax = -ax;
                }
                return ((hx & 0x80000000) ? -ax : ax);
        }

-- 
Steve
20161221 https://www.youtube.com/watch?v=IbCHE-hONow