The optimization of malloc(3): FreeBSD vs GNU libc
Intron
mag at intron.ac
Tue Aug 15 08:21:20 UTC 2006
Vladimir Kushnir wrote:
> Sorry for intrusion.
>
> On Mon, 14 Aug 2006, Brooks Davis wrote:
>
>> On Tue, Aug 15, 2006 at 07:10:47AM +0800, Intron wrote:
>>> One day, a friend told me that his program was 3 times slower under
>>> FreeBSD 6.1 than under GNU/Linux (from Redhat 7.2 to Fedora Core 5).
>>> I was astonished by the real repeatable performance difference on
>>> AMD Athlon XP 2500+ (1.8GHz, 512KB L2 Cache).
>>>
>>> After hacking, I found that the problem is nested in malloc(3) of
>>> FreeBSD libc.
>>>
>>> Download the testing program: http://ftp.intron.ac/tmp/fdtd.tar.bz2
>>>
>>> You may try to compile the program WITHOUT the macro "MY_MALLOC"
>>> defined (in Makefile) to use malloc(3) provided by FreeBSD 6.1.
>>> Then, time the running of the binary (on Athlon XP 2500+):
>>>
>>> #/usr/bin/time ./fdtd.FreeBSD 500 500 1000
>>> ...
>>> 165.24 real 164.19 user 0.02 sys
>>>
>>> Please try to recompile the program (Remember to "make clean")
>>> WITH the macro "MY_MALLOC" defined (in Makefile) to use my own
>>> simple implementation of malloc(3) (i.e. my_malloc() in cal.c).
>>> And time the running again:
>>>
>>> #/usr/bin/time ./fdtd.FreeBSD 500 500 1000
>>> ...
>>> 50.41 real 49.95 user 0.04 sys
>>>
>>> You may repeat this testing again and again.
>>>
>>> I guess this kind of performance difference comes from:
>>>
>>> 1. His program uses malloc(3) to obtain so many small memory blocks.
>>>
>>> 2. In this case, FreeBSD malloc(3) obtains small memory blocks from
>>> kernel and pass them to application.
>>>
>>> But malloc(3) of GNU libc obtains large memory blocks from kernel
>>> and splits & reallocates them in small blocks to application.
>>>
>>> You may verify my judgement with truss(1).
>>>
>>> 3. The way of FreeBSD malloc(3) makes VM page mapping too chaotic, which
>>> reduces the efficiency of CPU L2 Cache. In contrast, my my_malloc()
>>> simulates the behavior of GNU libc malloc(3) partially and avoids
>>> the over-chaos.
>>>
>>> Callgrind is broken under FreeBSD, or I will verify my guess with it.
>>>
>>> I have also verified the program on Intel Pentium 4 511 (2.8GHz, 1MB
>>> L2 cache, running FreeBSD 6.1 i386 though this CPU supports EM64T)
>>>
>>>> /usr/bin/time ./fdtd.FreeBSD 500 500 1000
>>> ...
>>> 185.30 real 184.28 user 0.02 sys
>>>
>>>> /usr/bin/time ./fdtd.FreeBSD 500 500 1000
>>> ...
>>> 36.31 real 35.94 user 0.03 sys
>>>
>>> NOTE: you probably cannot see the performance difference on CPU with
>>> small L2 cache such as Intel Celeron 1.7GHz with 128 KB L2 Cache.
>>
>> In CURRENT we've replaced phkmalloc with jemalloc. It would be useful
>> to see how this benchmark performs with that. I believe it does similar
>> things.
>>
>> -- Brooke
>>
> On -CURENT amd64 (Athlon64 3000+, 512k L2 cache):
>
> With jemalloc (without MY_MALLOS):
> ~/fdtd> /usr/bin/time ./fdtd.FreeBSD 500 500 1000
> ...
> 116.34 real 113.69 user 0.00 sys
>
> With MY_MALLOC:
> ~/fdtd> /usr/bin/time ./fdtd.FreeBSD 500 500 1000
> ...
> 45.30 real 44.29 user 0.00 sys
>
> Regards,
> Vladimir
> _______________________________________________
> freebsd-hackers at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe at freebsd.org"
How long has it been since you CVSup-ed your source tree last time?
These days the source tree is broken in building frequently, which
makes 7.0-CURRENT binaries on some users' computers out of date.
------------------------------------------------------------------------
From Beijing, China
More information about the freebsd-hackers
mailing list