The optimization of malloc(3): FreeBSD vs GNU libc

Intron mag at intron.ac
Tue Aug 15 08:21:20 UTC 2006


Vladimir Kushnir wrote:

> Sorry for intrusion.
> 
> On Mon, 14 Aug 2006, Brooks Davis wrote:
> 
>> On Tue, Aug 15, 2006 at 07:10:47AM +0800, Intron wrote:
>>> One day, a friend told me that his program was 3 times slower under
>>> FreeBSD 6.1 than under GNU/Linux (from Redhat 7.2 to Fedora Core 5).
>>> I was astonished by the real repeatable performance difference on
>>> AMD Athlon XP 2500+ (1.8GHz, 512KB L2 Cache).
>>> 
>>> After hacking, I found that the problem is nested in malloc(3) of
>>> FreeBSD libc.
>>> 
>>> Download the testing program: http://ftp.intron.ac/tmp/fdtd.tar.bz2
>>> 
>>> You may try to compile the program WITHOUT the macro "MY_MALLOC"
>>> defined (in Makefile) to use malloc(3) provided by FreeBSD 6.1.
>>> Then, time the running of the binary (on Athlon XP 2500+):
>>> 
>>> #/usr/bin/time ./fdtd.FreeBSD 500 500 1000
>>> ...
>>>        165.24 real       164.19 user         0.02 sys
>>> 
>>> Please try to recompile the program (Remember to "make clean")
>>> WITH the macro "MY_MALLOC" defined (in Makefile) to use my own
>>> simple implementation of malloc(3) (i.e. my_malloc() in cal.c).
>>> And time the running again:
>>> 
>>> #/usr/bin/time ./fdtd.FreeBSD 500 500 1000
>>> ...
>>>        50.41 real        49.95 user         0.04 sys
>>> 
>>> You may repeat this testing again and again.
>>> 
>>> I guess this kind of performance difference comes from:
>>> 
>>> 1. His program uses malloc(3) to obtain so many small memory blocks.
>>> 
>>> 2. In this case, FreeBSD malloc(3) obtains small memory blocks from
>>>    kernel and pass them to application.
>>> 
>>>    But malloc(3) of GNU libc obtains large memory blocks from kernel
>>>    and splits & reallocates them in small blocks to application.
>>> 
>>>    You may verify my judgement with truss(1).
>>> 
>>> 3. The way of FreeBSD malloc(3) makes VM page mapping too chaotic, which
>>>    reduces the efficiency of CPU L2 Cache. In contrast, my my_malloc()
>>>    simulates the behavior of GNU libc malloc(3) partially and avoids
>>>    the over-chaos.
>>> 
>>> Callgrind is broken under FreeBSD, or I will verify my guess with it.
>>> 
>>> I have also verified the program on Intel Pentium 4 511 (2.8GHz, 1MB
>>> L2 cache, running FreeBSD 6.1 i386 though this CPU supports EM64T)
>>> 
>>>> /usr/bin/time ./fdtd.FreeBSD 500 500 1000
>>> ...
>>>       185.30 real       184.28 user         0.02 sys
>>> 
>>>> /usr/bin/time ./fdtd.FreeBSD 500 500 1000
>>> ...
>>>        36.31 real        35.94 user         0.03 sys
>>> 
>>> NOTE: you probably cannot see the performance difference on CPU with
>>>    small L2 cache such as Intel Celeron 1.7GHz with 128 KB L2 Cache.
>> 
>> In CURRENT we've replaced phkmalloc with jemalloc.  It would be useful
>> to see how this benchmark performs with that.  I believe it does similar
>> things.
>> 
>> -- Brooke
>> 
> On -CURENT amd64 (Athlon64 3000+, 512k L2 cache):
> 
> With jemalloc (without MY_MALLOS):
>  ~/fdtd> /usr/bin/time ./fdtd.FreeBSD 500 500 1000
> ...
> 116.34 real       113.69 user         0.00 sys
> 
> With MY_MALLOC:
>  ~/fdtd> /usr/bin/time ./fdtd.FreeBSD 500 500 1000
> ...
> 45.30 real        44.29 user         0.00 sys
> 
> Regards,
> Vladimir
> _______________________________________________
> freebsd-hackers at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe at freebsd.org"

How long has it been since you CVSup-ed your source tree last time?

These days the source tree is broken in building frequently, which
makes 7.0-CURRENT binaries on some users' computers out of date.

------------------------------------------------------------------------
                                                From Beijing, China



More information about the freebsd-hackers mailing list