Fwd: [cfe-dev] More on atlas and clang

Mon Mar 11 08:37:35 UTC 2013

Recent benchmarks of Atlas with clang, recently posted to the clang list attached.  Note that the -fvectorize and -fslp-vectorize flags are enabling the new autovectorisation code in clang, which will be enabled by default in 3.3.  

David

Begin forwarded message:

> Hi there,
> 
> I have recently undertaken another experimental build of Atlas (http://math-atlas.sourceforge.net – briefly speaking, Atlas provides a highly complete BLAS/LAPACK implementation optimized for the native architecture of the computer on which it is running) on an AVX machine (MacMini 2011) using a snapshot of clang 3.3 (r173279) provided by MacPorts (http://macports.org), with -O3, -fPIC, -fvectorize and -fslp-vectorize flags. 
> 
> I am please to say that:
> 
> 1. The generated AVX code seems fine: a full test session run under an Atlas-based SciPy didn’t raise any error;
> 2. The performance seems now on-par or even (sometimes surprisingly) better than the ‘reference GCC’ – whatever that means (I was unable to get in touch with Atlas developer at that time) – as evidenced by the table below:
> 
> Reference clock rate=3292Mhz, new rate=2300Mhz
>  Refrenc : % of clock rate achieved by reference install
>  Present : % of clock rate achieved by present ATLAS install
> 
>                   single precision                  double precision
>           ********************************   *******************************
>                 real           complex           real           complex
>           ---------------  ---------------  ---------------  ---------------
> Benchmark   Refrenc Present  Refrenc Present  Refrenc Present  Refrenc Present
> =========   ======= =======  ======= =======  ======= =======  ======= =======
> kSelMM     1289.9  1407.4   1188.7  1229.8    686.7   826.8    647.4   682.1
> kGenMM      198.2   239.7    198.5   237.8    193.9   231.8    196.0   233.8
> kMM_NT      193.7   266.4    195.2   192.9    184.2   187.4    188.5   197.5
> kMM_TN      198.5   211.1    197.9   226.2    189.8   227.6    189.5   223.2
> BIG_MM     1213.8  1346.7   1241.3  1366.5    652.0   789.5    661.4   795.8
>  kMV_N      224.3   308.1    438.8   617.3    115.9   152.1    205.8   283.5
>  kMV_T      224.6   313.5    460.3   642.9    123.2   159.6    211.3   288.2
>   kGER      148.3   192.4    290.2   381.2     73.3    95.6    144.3   184.3
> 
> This is in stark contrast with the previous test where clang were lagging about 20% beyond the ‘reference implementation’ based on GCC for lines 2, 3 and 4 where compiler performance matters most.
> 
> So – to summarize in two words: kudos folks!
> 
> I will build another version on a Core2Duo machine tonight and see if the results are consistent.
> 
> Cheers!
> Vincent
> 
> 
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev