allow ffs & co. a binary search
Davide Italiano
davide at freebsd.org
Sun Jun 7 17:43:04 UTC 2015
On Sun, Jun 7, 2015 at 10:23 AM, Garrett Cooper <yaneurabeya at gmail.com> wrote:
> On Jun 7, 2015, at 9:44, Davide Italiano <davide at freebsd.org> wrote:
>
>> On Sun, Jun 7, 2015 at 9:37 AM, Davide Italiano <davide at freebsd.org> wrote:
>>> On Sun, Jun 7, 2015 at 6:54 AM, Konstantin Belousov <kostikbel at gmail.com> wrote:
>>>> On Sun, Jun 07, 2015 at 07:52:45PM +0800, Erich Dollansky wrote:
>>>>> What I saw is that all CPUs except ARM uses the software version [of ffs].
>>>>
>>>> Without quantifiers, this statement is not true. i386 libc function ffs(3)
>>>> uses bsfl instruction to do the job. Compilers know about ffs(3) and friends
>>>> as well, so e.g. gcc 5.1.0 generates the following code for the given
>>>> fragment:
>>>> return (ffs(x) + 1);
>>>> is translated to
>>>> 0: 0f bc c7 bsf %edi,%eax
>>>> 3: ba ff ff ff ff mov $0xffffffff,%edx
>>>> 8: 0f 44 c2 cmove %edx,%eax
>>>> b: 83 c0 02 add $0x2,%eax
>>>> (arg in %edi, result in %eax).
>>>>
>>>> I wrote a patch for amd64 libc long time ago to convert ffs/fls etc to use
>>>> of the bitstring instruction, but Bruce Evans argued that this would be
>>>> excessive. Your patch is excessive for the similar reasons.
>>>>
>>>> My guess is that significantly clever compiler would recognize a pattern
>>>> used by native ffs implementation and automatically use bitstring
>>>> instructions. E.g., this already happens with popcnt and recent
>>>> gcc/clang, I am just lazy to verify ffs.
>>>
>>> Clang trunk to the best of my knowledgde hasn't a way to recognize
>>> ffs() pattern.
>>> http://llvm.org/docs/doxygen/html/LoopIdiomRecognize_8cpp_source.html
>>> I can't comment about gcc as long as I'm not familiar with the implementation.
>>
>> Also, FWIW, for the fragment provided by Kostik, clang seems to
>> generate more instructions than gcc does,
>> I'll bring this upstream.
>>
>> 0: 0f bc c7 bsf %edi,%eax
>> 3: b9 20 00 00 00 mov $0x20,%ecx
>> 8: 0f 45 c8 cmovne %eax,%ecx
>> b: 83 c1 02 add $0x2,%ecx
>> e: 85 ff test %edi,%edi
>> 10: b8 01 00 00 00 mov $0x1,%eax
>> 15: 0f 45 c1 cmovne %ecx,%eax
>
> Hi Davide,
> Could you please post a) the clang/gcc compiler versions and b) the compilation options used when producing the snippet? clang -O0 for instance produces a lot of code, whereas -O2 produces considerably less (to the point that there have been a lot of complaints at $work about being able to debug clang-compiled programs because things have all been optimized out).
> Thanks!
> -NGie
About the version, as previously mentioned, it's trunk (as per today), r239228.
About the flags, it's -O3 -fomit-frame-pointer. I'm able to obtain the
same code Kostik got w/ gcc and the same flag, so I definitely feel
like clang does something different, which result in more
instructions, and I'll investigate further on this.
Thanks,
--
Davide
"There are no solved problems; there are only problems that are more
or less solved" -- Henri Poincare
More information about the freebsd-hackers
mailing list