cvs commit: src/lib/msun/src k_tanf.c

Thu Nov 24 02:04:27 GMT 2005

bde         2005-11-24 02:04:26 UTC

  FreeBSD src repository

  Modified files:
    lib/msun/src         k_tanf.c 
  Log:
  Optimized by eliminating the special case for 0.67434 <= |x| < pi/4.

  A single polynomial approximation for tan(x) works in infinite precision
  up to |x| < pi/2, but in finite precision, to restrict the accumulated
  roundoff error to < 1 ulp, |x| must be restricted to less than about
  sqrt(0.5/((1.5+1.5)/3)) ~= 0.707.  We restricted it a bit more to
  give a safety margin including some slop for optimizations.  Now that
  we use double precision for the calculations, the accumulated roundoff
  error is in double-precision ulps so it can easily be made almost 2**29
  times smaller than a single-precision ulp.  Near x = pi/4 its maximum
  is about 0.5+(1.5+1.5)*x**2/3 ~= 1.117 double-precision ulps.

  The minimax polynomial needs to be different to work for the larger
  interval.  I didn't increase its degree the old degree is just large
  enough to keep the final error less than 1 ulp and increasing the
  degree would be a pessimization.  The maximum error is now ~0.80
  ulps instead of ~0.53 ulps.

  The speedup from this optimization for uniformly distributed args in
  [-2pi, 2pi] is 28-43% on athlons, depending on how badly gcc selected
  and scheduled the instructions in the old version.  The old version
  has some int-to-float conversions that are apparently difficult to schedule
  well, but gcc-3.3 somehow did everything ~10 cycles or ~10% faster than
  gcc-3.4, with the difference especially large on AXPs.  On A64s, the
  problem seems to be related to documented penalties for moving single
  precision data to undead xmm registers.  With this version, the speed
  is cycles is almost independent of the athlon and gcc version despite
  the large differences in instruction selection to use the FPU on AXPs
  and SSE on A64s.

  Revision  Changes    Path
  1.17      +7 -16     src/lib/msun/src/k_tanf.c