Re: Accuracy of Mathematical Functions

From: Steve Kargl <sgk_at_troutmask.apl.washington.edu>
Date: Tue, 26 Sep 2023 18:08:04 UTC
On Tue, Sep 26, 2023 at 03:26:16PM +0200, Alexander Leidinger wrote:
> Am 2023-09-25 15:50, schrieb Paul Zimmermann:
> 
> > We have updated our comparison:
> > 
> > https://members.loria.fr/PZimmermann/papers/accuracy.pdf
> > 
> > This new update includes for the first time the FreeBSD math library,
> > whose accuracy is quite good, except:
> 
> I wonder how those functions/libs you tested compare in terms of speed...
> It would allow to provide a hint to the question
>   "Which lib is the fastest and fulfills the needs in terms of accuracy for
> the intended use-case?"
> 
> I agree that the best way to do this requires to run all libs on the same
> hardware and OS, which is not feasible in your approach. What may be
> feasible is to compare the relative performance of those subsets, which you
> run on the same hardware.
> 

Speed vs accuracy is always a trade-off.  Consider

% ./tlibm j0 -f -x 0x1p-9 -X 2 -s 1 -N 4
Interval tested for j0f: [0.00195312,2]
4000000 calls, 0.040901 secs, 0.01023 usecs/call

% ./tlibm j0 -f -x 0x1p-9 -X 2 -s 1 -N 4
Interval tested for j0f: [0.00195312,2]
4000000 calls, 0.092471 secs, 0.02312 usecs/call

The former timing is for FreeBSD libm on an AMD FX-8350 using 
only -O2 optimization.  The latter is a patched libm, which uses
-O2 -funroll-loops -march=bdver2, and is twice as slow!  The
difference lies in accuracy.  The former gives

% ./tlibm j0 -f -x 0x1p-9 -X 2 -PD -N 4
Interval tested for j0f: [0.00195312,2]
       ulp <= 0.5:  85.04%   3401545 |  85.039%   3401545
0.5 <  ulp <  0.6:  5.376%    215028 |  90.414%   3616573
0.6 <  ulp <  0.7:  3.107%    124266 |  93.521%   3740839
0.7 <  ulp <  0.8:  2.432%     97284 |  95.953%   3838123
0.8 <  ulp <  0.9:  1.740%     69612 |  97.693%   3907735
0.9 <  ulp <  1.0:  0.941%     37646 |  98.635%   3945381
1.0 <  ulp <  1.5:  1.108%     44312 |  99.742%   3989693
1.5 <  ulp <  2.0:  0.195%      7791 |  99.937%   3997484
2.0 <  ulp <  3.0:  0.062%      2491 |  99.999%   3999975
3.0 <  ulp <  0.0:  0.001%        25 | 100.000%   4000000
Max ulp: 3.259556 at 1.9667229652404785e+00

while the latter has

% ./tlibm j0 -f -x 0x1p-9 -X 2 -PD -N 4
Interval tested for j0f: [0.00195312,2]
       ulp <= 0.5:  86.76%   3470362 |  86.759%   3470362
0.5 <  ulp <  0.6:  5.531%    221257 |  92.290%   3691619
0.6 <  ulp <  0.7:  2.761%    110437 |  95.051%   3802056
0.7 <  ulp <  0.8:  1.705%     68195 |  96.756%   3870251
0.8 <  ulp <  0.9:  1.228%     49134 |  97.985%   3919385
0.9 <  ulp <  1.0:  0.841%     33628 |  98.825%   3953013
1.0 <  ulp <  1.5:  1.087%     43475 |  99.912%   3996488
1.5 <  ulp <  2.0:  0.087%      3473 |  99.999%   3999961
2.0 <  ulp <  3.0:  0.001%        39 | 100.000%   4000000
Max ulp: 2.157274 at 1.9673234224319458e+00

The latter is more accurate, but its underlying algorithm
uses summation-and-carry of the ascending series.  This 
algorithm is sensitive to compiler options, so I haven't 
pushed it FreeBSD (, yet).

-- 
Steve