Assembly string functions in i386 libc
Sean C. Farley
scf at FreeBSD.org
Thu Jul 12 21:03:07 UTC 2007
On Thu, 12 Jul 2007, Bruce Evans wrote:
> On Thu, 12 Jul 2007, Bruce Evans wrote:
>
>> On Wed, 11 Jul 2007, Sean C. Farley wrote:
>>
>>> While looking at increasing the speed of strlen(), I noticed that on
>>> i386 platforms (PIII, P4 and Athlon XP) the performance is abysmal
>>> in libc compared to the version I was writing. After more testing,
>>> I found it was only the assembly version that is really slow. The C
>>> version is fairly quick. Is there a need to continue to use the
>>> assembly versions of string functions on i386? Does it mainly help
>>> slower systems such as those with i386 or i486 CPU's?
>>
>> I think you are mistaken about the asm version being slow. In my
>> tests ...
>
> Partly.
>
>>> I have the results from my P4 (Id = 0xf24 Stepping = 4) system and
>>> the test program here[1]. strlen.tar.bz2 is the archive of it for
>>> anyone's testing. In the strlen/results subdirectory, there are the
>>> results for strings of increasing lengths.
>>
>> Sorry, I didn't look at this. I just wrote a quick re-test and ran
>> it
>
> Now I've looked at it. I think it is not testing strlen() at all,
> except for the libc case, because __pure prevents more than 1 call to
> strlen(). (The existence of __pure is also a bug. __pure was the
> FreeBSD spelling of the __const__ attribute in gcc-1. It was removed
> when special support for gcc-1 was dropped, and should not have been
> recycled.) __pure is a syntax error in the old version of FreeBSD
> that I tested on. I first tried __pure2, which is the FreeBSD
> spelling of the __const__ attribute in gcc-2. I think it is weaker
> than the __pure__ attribute in gcc-3.
>From what I could find, strlen() should not have the __const__ (__pure2)
attribute since it is being passed a pointer, but __pure__ (__pure)
should work. Are you saying that __pure used to mean __const__ in gcc-1
but now it means __pure__ for gcc-2.96 and above? The redefinition of
__pure is what you are saying is a bug. Yes?
> After removing __pure* and adding -static -g to CFLAGS, with
> gcc-3.3.3:
>
> On a old Celeron (400MHz) (all P2's probably behave like this):
>
> %%%
> libcstrlen: time spent executing strlen(string) = 64: 7.786868
> basestrlen: time spent executing strlen(string) = 64: 3.816736
> strlen: time spent executing strlen(string) = 64: 3.364313
> strlen2: time spent executing strlen(string) = 64: 2.662973
> %%%
>
> rep scasb is apparently very slow on P2's.
>
> On an A64 in i386 mode:
>
> %%%
> libcstrlen: time spent executing strlen(string) = 64: 0.709657
> basestrlen: time spent executing strlen(string) = 64: 0.691397
> strlen: time spent executing strlen(string) = 64: 0.527339
> strlen2: time spent executing strlen(string) = 64: 0.441090
> %%%
>
> Now rep scasb is only slightly slower than the simple C loop (since
> all small loops take 2 cycles on AXP and A64...). strlen and strlen2
> are marginally faster since their loops do more.
>
> basestrlen is fastest for lengths <= 5 on the Celeron.
>
> basestrlen is fastest for lengths <= 9 on the A64.
I removed __pure from main.c and added -static -g.
Athlon XP 2100 (1.72 GHz):
libcstrlen: time spent executing strlen(string) = 64: 0.994755
asmstrlen: time spent executing strlen(string) = 64: 0.989012
basestrlen: time spent executing strlen(string) = 64: 0.879722
strlen: time spent executing strlen(string) = 64: 0.626727
strlen2: time spent executing strlen(string) = 64: 0.587162
P4 1.6 GHz:
libcstrlen: time spent executing strlen(string) = 64: 2.412558
asmstrlen: time spent executing strlen(string) = 64: 2.413904
basestrlen: time spent executing strlen(string) = 64: 1.049927
strlen: time spent executing strlen(string) = 64: 0.543575
strlen2: time spent executing strlen(string) = 64: 0.547015
PIII 450MHz:
libcstrlen: time spent executing strlen(string) = 64: 6.976066
asmstrlen: time spent executing strlen(string) = 64: 6.974106
basestrlen: time spent executing strlen(string) = 64: 3.464854
strlen: time spent executing strlen(string) = 64: 2.541872
strlen2: time spent executing strlen(string) = 64: 2.339469
The Athlon XP did much better with the assembly version than either
Intel CPU for me. For all three CPU's using various string lengths from
1 to 256, the C versions always beat the assembly version although it
came somewhat close for the 9 to 32 byte lengths to basestrlen.
Even if this does not show that the assembly version should be replaced,
I find this performance testing interesting. I learned something new.
Sean
--
scf at FreeBSD.org
More information about the freebsd-arch
mailing list