cvs commit: src/lib/msun/i387 Makefile.inc e_atan2.S e_atan2f.S
s_atan.S
David Schultz
das at FreeBSD.ORG
Tue Feb 22 20:18:17 GMT 2005
On Tue, Feb 22, 2005, Nate Lawson wrote:
> David Schultz wrote:
> >By the way, here are some other results for the Pentium 4, all
> >without SSE. SSE makes things a bit worse, probably because the
> >x87 and SSE registers are shared, and the Pentium 4 imposes a
> >large penalty for switching between the two sets.
>
> I don't believe this is correct. MMX and x87 use the same register
> context (hence emms), however the XMM registers (SSE*) are separate.
> It's possible gcc is generating MMX instructions though with your SSE
> command line switch.
Yep, you're right, I was thinking of the MMX register set. I
compared the code generated by gcc with and without SSE/SSE2, and
found that the only thing it uses SSE2 for is converting from
floating point->integer and back (e.g. CVTTSD2SI instead of i387
control word frobbing and FISTL). There was also one place where
gcc just got confused and juggled around a bunch of registers on
the i387 stack, but I don't think that accounts for the
difference. I wonder if CVTTSD2SI and friends are slower than an
OR/MOV/FLDCW/FISTL/FLDCW sequence on the Pentium 4 for some
bizarre reason, or if I missed something else significant while
scanning the diff.
More information about the cvs-src
mailing list