cvs commit: src/include _ctype.h

Thu Nov 1 07:21:21 PDT 2007

On Thu, 1 Nov 2007, Christoph Mallon wrote:

> Andrey Chernov wrote:
>> On Tue, Oct 30, 2007 at 10:03:31AM -1000, Juli Mallett wrote:
>>> * "Andrey A. Chernov" <ache at FreeBSD.org> [ 2007-10-27 ]
>>> 	[ cvs commit: src/include _ctype.h ]
>>>> ache        2007-10-27 22:32:28 UTC
>>>> 
>>>>   FreeBSD src repository
>>>> 
>>>>   Modified files:
>>>>     include              _ctype.h   Log:
>>>>   Micro-optimization of prev. commit, change
>>>>   (_c < 0 || _c >= 128) to (_c & ~0x7F)
>>> Isn't that a non-optimization in code and a minor pessimization of 
>>> readability?
>> ...
>> For ones who doubts there two tests compiled with -O2. As you may see the 
>> result is almost identical (andl vs cmpl):

We never doubted that it was a small negative or non-optimization :-).

Look closer and you will see that the andl version takes 2 extra
instructions, since both versions are smart enough to avoid a branch,
and for this they need the result of the condition code generated by
the andl or cmpl, and the cmpl generates the desired condition code
directly while 2 more instructions are needed after the andl.

>> -------------------- a.c --------------------
>> main () {
>> 
>> 	int c;
>> 
>> 	return (c & ~0x7f) ? 0 : c * 2;
>> }

This example has many flaws as pointed out by Cristoph:
- c is uninitialized
- the result depends on c in a way that is quite different than the table
   lookup for ctype.  The above expression happens to be more optimizable.

>> -------------------- a.s --------------------
>> 	.file	"a.c"
>> 	.text
>> 	.p2align 4,,15
>> .globl main
>> 	.type	main, @function
>> main:
>> 	leal	4(%esp), %ecx
>> 	andl	$-16, %esp
>> 	pushl	-4(%ecx)
>> 	movl	%eax, %edx             <--- extra instruction since andl
                                             clobbers a register.  Normally,
 					    testl should be used to avoid
 					    this clobber.
>> 	andl	$-128, %edx            <--- this sets %edx to something
                                             and also sets the condition
 					    codes, but not like we want
>> 	addl	%eax, %eax             <--- c * 2
>> 	cmpl	$1, %edx               <--- this sets the condition codes
                                             like we want
>> 	sbbl	%edx, %edx             <--- turn condition codes into a
                                             mask in %edx: mask = 0xffffffff
 					    if the result should be c *2
 					    and mask = 0 if the result should
 					    be 0
>> 	pushl	%ebp
>> 	andl	%edx, %eax             <--- result = (c * 2) & mask
>> 	movl	%esp, %ebp             <--- why is it bothering to set up
                                             a frame this late?
>> 	pushl	%ecx
>> 	popl	%ecx
>> 	popl	%ebp
>> 	leal	-4(%ecx), %esp
>> 	ret
>> 	.size	main, .-main
>> 	.ident	"GCC: (GNU) 4.2.1 20070719  [FreeBSD]"
>> -------------------- a1.c --------------------
>> main () {
>> 
>> 	int c;
>> 
>> 	return (c < 0 || c >= 128) ? 0 : c * 2;
>> 
>> 
>> }
>> -------------------- a1.s --------------------
>> 	.file	"a1.c"
>> 	.text
>> 	.p2align 4,,15
>> .globl main
>> 	.type	main, @function
>> main:
>> 	leal	4(%esp), %ecx
>> 	andl	$-16, %esp
>> 	pushl	-4(%ecx)
>> 	addl	%eax, %eax
>> 	cmpl	$128, %eax             <--- cmpl puts result in condition
                                             codes directly where we want it
>> 	sbbl	%edx, %edx             <--- same masking stuff ...
>> 	andl	%edx, %eax
>> 	pushl	%ebp
>> 	movl	%esp, %ebp
>> 	pushl	%ecx
>> 	popl	%ecx
>> 	popl	%ebp
>> 	leal	-4(%ecx), %esp
>> 	ret
>> 	.size	main, .-main
>> 	.ident	"GCC: (GNU) 4.2.1 20070719  [FreeBSD]"
>
> Your example is invalid. The value of c is undefined in this function and you 
> see random garbage as result (for example in the code snippet you see the c * 
> 2 (addl %eax, %eax) and after that is the cmpl, which uses %eax, too). In 
> fact it would be perfectly legal for the compiler to always return 0, call 
> abort(), or let demons fly out of your nose.

However, the uninitialized c = %eax seems to be transformed correctly in
both cases.  The first case even preserves %eax from the andl.

>
> Also the example is still unrealistic: You usually don't multiply chars by 
> two. Lets try something more realistic: an ASCII filter

Indeed.

Bruce