More kernel performance tests on FreeBSD 10.0-CURRENT
Dimitry Andric
dimitry at andric.com
Fri Sep 21 21:39:41 UTC 2012
Hi all,
As a followup to my previous post about the performance of FreeBSD 10.0
kernels compiled with different compilers (clang and gcc), I did another
series of tests, now on a more modern machine (Core i5-based). I also
tested the performance with different compiler optimization settings.
The attached text file[1] contains more information about these tests,
performance data, and my conclusions. Any errors and omissions are also
my fault, so if you notice them, please let me know.
The executive summary: GENERIC kernels compiled with clang 3.2 are again
a little faster than those compiled with gcc 4.2.1. For gcc, compiling
with -O2 also gives a slightly faster kernel than with -O1, but for
clang there is no measurable difference between those flags.
Again, many thanks to Gavin Atkinson for providing the required
hardware.
-Dimitry
[1]: Also available at:
<http://www.andric.com/freebsd/perftest/perftest-kernel-2012-09-21a.txt>
-------------- next part --------------
KERNEL PERFORMANCE TESTS ON FREEBSD 10.0-CURRENT, SEPTEMBER 2012, PART 2
========================================================================
INTRODUCTION
------------
These tests aim to give an indication of the runtime performance of FreeBSD
kernels compiled with different compilers, at various optimization levels. The
compilers tested were:
- gcc 4.2.1, the system compiler in FreeBSD.
- clang 3.2 (trunk 162107), which is the default version of clang in FreeBSD
10.0-CURRENT, after r239462.
All tests were run on a machine gracefully provided by Gavin Atkinson, which is
based on an Intel DQ57TM desktop board, with a quad-core 3.20 GHz Intel Core i5
CPU (id=0x20652), and 4 GB RAM. It runs FreeBSD/amd64 10.0-CURRENT as of Tue
Sep 11 19:11:00 UTC 2012. An excerpt of dmesg follows:
CPU: Intel(R) Core(TM) i5 CPU 650 @ 3.20GHz (3192.08-MHz K8-class CPU)
Origin = "GenuineIntel" Id = 0x20652 Family = 6 Model = 25 Stepping = 2
Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,
CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,
TM,PBE>
Features2=0x298e3ff<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,SMX,EST,TM2,SSSE3,
CX16,xTPR,PDCM,SSE4.1,SSE4.2,POPCNT,AESNI>
AMD Features=0x28100800<SYSCALL,NX,RDTSCP,LM>
AMD Features2=0x1<LAHF>
TSC: P-state invariant, performance statistics
real memory = 4294967296 (4096 MB)
avail memory = 3882647552 (3702 MB)
Event timer "LAPIC" quality 600
ACPI APIC Table: <INTEL DQ57TM >
FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs
FreeBSD/SMP: 1 package(s) x 2 core(s) x 2 SMT threads
cpu0 (BSP): APIC ID: 0
cpu1 (AP): APIC ID: 1
cpu2 (AP): APIC ID: 4
cpu3 (AP): APIC ID: 5
With each compiler, stock GENERIC kernels for amd64 were built from head as of
r240384, for each of the following optimization flags:
-O2 -frename-registers -pipe -fno-strict-aliasing
-O1 -pipe
-O0 -pipe
Note that clang does not support -frename-registers, so it was omitted for the
corresponding kernel builds. No CPU-specific optimization flags (-march=) were
used.
Each kernel was installed into a separate kernel installation directory under
/boot. The system was then booted with each of these kernels, without modifying
anything else, and multiple runs of "make -j8 buildworld" were done. Between
each run, the /usr/obj directory was fully cleaned out, and filesystems were
synced.
The timing results, processed with ministat(1), are below.
Building world, multi-threaded, on a GENERIC kernel compiled by clang 3.2 -O0
-----------------------------------------------------------------------------
N Min Max Median Avg Stddev
real 6 6503.62 6527.84 6520.49 6517.2817 8.3845558
user 6 12534.49 12576.55 12555.29 12555.547 14.079771
sys 6 9655.1 9733.92 9716.1 9709.9533 28.981809
maxrss 6 758208 758248 758224 758222.67 13.779213
ixrss 6 4396 4401 4397 4397.1667 1.9407902
idrss 6 523 523 523 523 0
isrss 6 126 126 126 126 0
minflt 6 6.6264519e+08 6.6337812e+08 6.6297908e+08 6.6299306e+08 249092.49
majflt 6 4354 10457 5722 6207.8333 2208.4725
nswap 6 40 56 42 44.333333 6.1210021
inblock 6 25167 44267 29212 31042.667 6677.3727
oublock 6 32801 34666 33500 33635.167 692.27897
msgsnd 6 0 0 0 0 0
msgrcv 6 0 0 0 0 0
nsignals 6 60495 60504 60502 60500 3.5213634
nvcsw 6 1750409 1759010 1754971 1754668.8 3641.3163
nivcsw 6 1867335 1943885 1924258 1909641.2 30495.366
Building world, multi-threaded, on a GENERIC kernel compiled by clang 3.2 -O1
-----------------------------------------------------------------------------
N Min Max Median Avg Stddev
real 6 4788.59 4831.96 4798.01 4802.305 15.48322
user 6 12239.94 12285.9 12268.91 12263.5 17.190572
sys 6 4041.05 4100.4 4083.92 4076.235 21.374684
maxrss 6 758212 758256 758256 758242.67 18.532855
ixrss 6 4963 4971 4964 4964.6667 3.1411251
idrss 6 589 590 589 589.16667 0.40824829
isrss 6 132 132 132 132 0
minflt 6 6.617985e+08 6.6339562e+08 6.629315e+08 6.6272587e+08 574835.78
majflt 6 7935 23481 17450 16901.667 5324.564
nswap 6 40 52 48 47.333333 3.9327683
inblock 6 25121 44292 29173 30980.667 6715.0864
oublock 6 24867 28037 26579 26667.167 1162.513
msgsnd 6 0 0 0 0 0
msgrcv 6 0 0 0 0 0
nsignals 6 60492 60500 60498 60496.667 3.4448028
nvcsw 6 1559857 1576788 1562507 1565002.8 6454.8513
nivcsw 6 1632143 1721204 1688209 1682830 35836.46
Building world, multi-threaded, on a GENERIC kernel compiled by clang 3.2 -O2
-----------------------------------------------------------------------------
N Min Max Median Avg Stddev
real 6 4780.24 4819.77 4801.98 4798.5867 14.236627
user 6 12242.91 12275.04 12256.37 12255.905 11.676621
sys 6 4052.75 4118.65 4104.76 4096.2217 22.874298
maxrss 6 758220 758256 758256 758244.67 17.603031
ixrss 6 4960 4970 4964 4963.8333 3.4880749
idrss 6 589 590 589 589.16667 0.40824829
isrss 6 132 132 132 132 0
minflt 6 6.6248246e+08 6.6340936e+08 6.6300404e+08 6.6293496e+08 324940.82
majflt 6 4300 22493 14128 12176.833 6396.7734
nswap 6 40 52 48 46 4.8989795
inblock 6 29120 44375 29277 31760 6180.4181
oublock 6 24915 28157 25984 26315.333 1251.164
msgsnd 6 0 0 0 0 0
msgrcv 6 0 0 0 0 0
nsignals 6 60490 60499 60497 60495.667 3.2041639
nvcsw 6 1559291 1575794 1570626 1569117.3 5467.274
nivcsw 6 1593865 1678135 1654604 1640246 31701.067
Building world, multi-threaded, on a GENERIC kernel compiled by gcc 4.2.1 -O0
-----------------------------------------------------------------------------
N Min Max Median Avg Stddev
real 6 6083.69 6101.08 6096.85 6094.4383 6.5165003
user 6 12424.93 12462.24 12438.63 12441.97 12.975073
sys 6 8305.66 8394.45 8377.26 8366.4767 32.469675
maxrss 6 758208 758256 758224 758225.33 16.52473
ixrss 6 4481 4491 4484 4484.6667 3.3862467
idrss 6 533 534 533 533.16667 0.40824829
isrss 6 127 127 127 127 0
minflt 6 6.6241224e+08 6.6339646e+08 6.6301629e+08 6.6292507e+08 336924.37
majflt 6 4357 9603 6231 6667.8333 1812.2422
nswap 6 40 48 40 41.666667 3.204164
inblock 6 29162 44302 29272 31759.333 6145.0026
oublock 6 30081 32816 31538 31281.5 1163.8237
msgsnd 6 0 0 0 0 0
msgrcv 6 0 0 0 0 0
nsignals 6 60500 60501 60500 60500.333 0.51639753
nvcsw 6 1701009 1713077 1709140 1707903 3975.4753
nivcsw 6 1854572 1936195 1896858 1894873.2 26725.543
Building world, multi-threaded, on a GENERIC kernel compiled by gcc 4.2.1 -O1
-----------------------------------------------------------------------------
N Min Max Median Avg Stddev
real 6 4943.74 4965.28 4955.62 4953.78 7.2888627
user 6 12274.46 12334.13 12322.13 12314.472 21.858036
sys 6 4576.99 4621.09 4617.21 4609.75 16.658918
maxrss 6 758208 758256 758224 758232 19.595918
ixrss 6 4897 4902 4898 4898.6667 1.9663842
idrss 6 581 582 581 581.33333 0.51639778
isrss 6 131 131 131 131 0
minflt 6 6.626435e+08 6.634147e+08 6.6301953e+08 6.629835e+08 279004.88
majflt 6 6092 11215 9188 8755.1667 1849.3565
nswap 6 40 62 48 49.333333 7.1180522
inblock 6 29076 44462 29163 31697 6253.6444
oublock 6 25415 28495 28175 27508.167 1179.5914
msgsnd 6 0 0 0 0 0
msgrcv 6 0 0 0 0 0
nsignals 6 60488 60499 60495 60494.333 3.9832984
nvcsw 6 1575048 1588567 1584504 1582316.7 5705.6913
nivcsw 6 1682902 1745827 1730506 1722802.3 24060.717
Building world, multi-threaded, on a GENERIC kernel compiled by gcc 4.2.1 -O2
-----------------------------------------------------------------------------
N Min Max Median Avg Stddev
real 6 4876.16 4901.55 4895.24 4888.7583 10.598318
user 6 12241.35 12306.04 12283.94 12278.767 23.922356
sys 6 4400.43 4452.62 4446.22 4438.0117 19.231095
maxrss 6 758212 758256 758224 758229.33 17.095809
ixrss 6 4899 4905 4900 4900.6667 2.2509257
idrss 6 581 582 582 581.83333 0.40824829
isrss 6 131 131 131 131 0
minflt 6 6.6214332e+08 6.6334997e+08 6.6298766e+08 6.6278723e+08 436172.22
majflt 6 6055 12473 9169 8895.5 2381.6063
nswap 6 40 54 48 48 4.5607017
inblock 6 29193 44443 29313 31804 6192.0071
oublock 6 25113 28152 26770 26490.167 1254.3383
msgsnd 6 0 0 0 0 0
msgrcv 6 0 0 0 0 0
nsignals 6 60496 60501 60499 60498.667 2.2509257
nvcsw 6 1566521 1592140 1579251 1578889.5 9354.883
nivcsw 6 1686675 1809406 1785290 1756283.7 50719.325
Summary:
--------
On a kernel compiled with clang 3.2 -O2, building world in multi-threaded mode
is ~1.9% faster in real time than on a kernel compiled with gcc 4.2.1 -O2, and
~8.3% faster in system time.
On a kernel compiled with clang 3.2 -O1, building world in multi-threaded mode
is ~3.2% faster in real time than on a kernel compiled with gcc 4.2.1 -O1, and
~13.1% faster in system time.
On a kernel compiled with gcc 4.2.1 -O2, building world in multi-threaded mode
is ~1.3% faster in real time than on a kernel compiled with gcc 4.2.1 -O1, and
~3.9% faster in system time.
The difference between building world in multi-threaded mode on kernels compiled
with clang 3.2 -O2 and -O1 is not significant (to within 1 standard deviation).
Conclusion:
-----------
Kernels compiled with clang are a little faster in real time for building world,
and in system time the difference is even larger, roughly 10%. For clang, the
difference between -O1 and -O2 is not measurable, but for gcc, -O2 is slightly
faster than -O1.
================================================================================
Copyright (c) 2012 Dimitry Andric <dimitry at andric.com>
Verbatim copying and redistribution of this entire text are permitted, provided
this notice is preserved.
================================================================================
More information about the freebsd-current
mailing list