Implementation of half-cycle trignometric functions
Steve Kargl
sgk at troutmask.apl.washington.edu
Fri Apr 28 16:56:59 UTC 2017
On Fri, Apr 28, 2017 at 07:13:16PM +1000, Bruce Evans wrote:
> On Thu, 27 Apr 2017, Steve Kargl wrote:
>
> > On Thu, Apr 27, 2017 at 04:14:11PM -0700, Steve Kargl wrote:
> >>
> >> I have attached a new diff to the bugzilla report. The
> >> diff is 3090 lines and won't be broadcast the mailing list.
> >>
> >> This diff includes fixes for a few inconsequential bugs
> >> and implements modified Estrin's method for sum a few
> >> ploynomials. If you want the previous Horner's method
> >> then add -DHORNER to your CFLAGS.
> >
> > For those curious about testing, here are some numbers
> > for the Estrin's method. These were obtained on an AMD
> > FX(tm)-8350 @ 4018.34-MHz. The times are in microseconds
> > and the number in parentheses are estimated cycles.
> >
> > | cospi | sinpi | tanpi
> > ------------+--------------+--------------+--------------
> > float | 0.0089 (37) | 0.0130 (54) | 0.0194 (80)
> > double | 0.0134 (55) | 0.0160 (66) | 0.0249 (102)
> > long double | 0.0557 (228) | 0.0698 (287) | 0.0956 (393)
> > ------------+--------------+--------------+--------------
> >
> > The timing tests are done on the interval [0,0.25] as
> > this is the interval where the polynomial approximations
> > apply. Limited accuracy testing gives
>
> These still seem slow. I did a quick test of naive evaluations of
> these functions as standard_function(Pi * x) and get times a bit faster
> on Haswell, except 2-4 times faster for the long double case, with the
> handicaps of using gcc-3.3.3 and i386. Of course, the naive evaluation
> is inaccurate, especially near multiples of Pi/2.
The slowness comes from handling extra precision arithmetic. For
sinpi(x) = x * p(x) where p(x) = s0 + x2 * S(x2) is the polynomial
approximation and x2 = x * x. One needs to take care in the ordering
on the evaluation where x and x2 are split into high and low parts.
See the source code posted in response to your other email.
> > x in [0,0.25] | tanpif | tanpi | tanpil
> > -----------------+------------+------------+-------------
> > MAX ULP | 1.37954760 | 1.37300168 | 1.38800823
>
> Just use the naive evaluation to get similar errors in this
> range. It is probably faster too.
I haven't checked tanpif, but naive evaluation is faster
for sinpif(x). But getting the worry answer fast seems
to be counter to what one may expect from a math library.
Consider the two naive implementations:
float
bde1(float x)
{
return (sinf((float)M_PI * x));
}
float
bde2(float x)
{
return (sinf((float)(M_PI * x)));
}
float
bde3(float x)
{
return ((float)sin(M_PI * x));
}
x in [0,0.25] | sinpif | bde1(x) | bde2(x) | bde3(x)
-----------------+-------------+-------------+--------------+------------
Time usec (cyc) | 0.0130 (54) | 0.0084 (35) | 0.0093 (38) | 0.0109 (45)
MAX ULP | 0.80103672 | 1.94017851 | 1.46830785 | 0.5
1.0 < ULP | 0 | 736496 | 47607 | 0
0.9 < ULP <= 1.0 | 0 | 563122 | 101162 | 0
0.8 < ULP <= 0.9 | 1 | 751605 | 386128 | 0
0.7 < ULP <= 0.8 | 727 | 942349 | 753647 | 0
0.6 < ULP <= 0.7 | 19268 | 1169719 | 1123518 | 0
x in [3990,3992] | sinpif | bde1(x) | bde2(x) | bde3(x)
-----------------+-------------+-------------+-------------+------------
Time usec (cyc) | 0.0161 (66) | 0.0137 (56) | 0.0144 (59) | 0.0211 (87)
MAX ULP | 0.65193975 | 10458803. | 6471461. | 0.5
1.0 < ULP | 0 | 16773117 | 16762878 | 0
0.9 < ULP <= 1.0 | 0 | 0 | 0 | 0
0.8 < ULP <= 0.9 | 0 | 0 | 0 | 0
0.7 < ULP <= 0.8 | 0 | 0 | 0 | 0
0.6 < ULP <= 0.7 | 19268 | 2047 | 2047 | 0
So bde3(x) is best, but if you're going to use sin(M_PI*x) for
float sinpif, then use sinpi((double)x) which is faster than bde3
and just as accurate.
--
Steve
20170425 https://www.youtube.com/watch?v=VWUpyCsUKR4
20161221 https://www.youtube.com/watch?v=IbCHE-hONow
More information about the freebsd-hackers
mailing list