libc assembly optimizations?
James Van Artsdalen
james-freebsd-amd64 at jrv.org
Tue Dec 30 02:16:32 PST 2003
Here's an alternative for fabs (3):
ENTRY(fabs)
psllq $1,%xmm0 /* 64-bit shifts lefts */
psrlq $1,%xmm0 /* logical shift right clears sign */
ret
/usr/src/lib/libc/amd64/gen/fabs.S does the code below.
gcc generates essentially the same code as below.
The shifts above seem to work and look better to me.
The string ops can made be significantly improved if allowed to
read extra bytes around the string but within the same 16-byte
paragraph as the start or end of the string. This seems safe in
userland.
Finally, can the SSE2 regs be safely used in kernel mode?
Page fill and aligned-bulk bcopy calls can be improved this way.
/*
* Ok, this sucks. Is there really no way to push an xmm register onto
* the FP stack directly?
*/
ENTRY(fabs)
movsd %xmm0, -8(%rsp)
fldl -8(%rsp)
fabs
fstpl -8(%rsp)
movsd -8(%rsp),%xmm0
ret
More information about the freebsd-amd64
mailing list