svn commit: r238907 - projects/calloutng/sys/kern

Wed Dec 5 23:14:52 UTC 2012

On Sat, Nov 24, 2012 at 3:43 PM, Attilio Rao <attilio at freebsd.org> wrote:
> On Thu, Nov 8, 2012 at 5:26 PM, Bruce Evans <brde at optusnet.com.au> wrote:
>> On Fri, 2 Nov 2012, Attilio Rao wrote:
>>
>>> On 10/29/12, Bruce Evans <brde at optusnet.com.au> wrote:
>>>>
>>>> On Mon, 29 Oct 2012, Attilio Rao wrote:
>>>>
>>>>> Now that sched_pin()/sched_unpin() are fixed I would like to introduce
>>>>> this new patch, making critical_enter()/critical_exit() inline:
>>>>> http://www.freebsd.org/~attilio/inline_critical.patch
>>>>
>>>> ...
>>>>
>>>> My version goes the other way and uninlines mtx_lock_spin() and
>>>> mtx_unlock_spin().  Then it inlines (open codes) critical_enter() and
>>>> ...
>>>
>>>
>>> So, I thought more about this and I think that inlining
>>> critical_exit() is not really going to bring any benefit here but
>>> bloat.
>>> This is because nested critical sections are rare rather not, which
>>
>>
>> Rather rare !not? :-)
>>
>>
>>> means you will end up in doing the function call most of the time and
>>> plus we have the possible pessimization introduced by the memory
>>> clobber (__compiler_membar()) and as you note possible deficiency
>>> caming from the branch misprediction*.
>>
>>
>> This seems best.
>>
>> I see a point about the rareness of the branch in critical_exit().
>> Not sure if it is the one you are making: since the nested case is
>> rare, then the branch will normally be correctly predicted.  If the
>> function remains uninlined, then the branch still has a good chance
>> of being correctly predicted.  This depends on the nested case being
>> so rare across all callers, that the non-nested case doesn't mess
>> up the prediction by happening often.  The branch predictor only
>> has to maintain history for 1 branch for this.  However, if the
>> call is inlined and there are many callers, there might be too many
>> to maintain history for them all.
>
> Yes, that's basically the same conclusion I came up with.
>
> It seems you are not opposed to this version of the patch.
> I made some in-kernel benchmarks but they didn't really show a
> performance improvements, bigger than 1-2%, which on SMP system is
> basically accountable to thirdy-part effects.
>
> However we already employ the same code for sched_pin() and I then
> think that we can just commit this patch as-is.

I made up my mind on instead not committing this patch, as I cannot
prove a real performance gain, as also Jeff agreed with privately.
Instead, I would like to commit this small comment explaining why it
is not inlined (see below). Let me know what you think.

Thanks,
Attilio

Index: sys/kern/kern_switch.c
===================================================================

--- sys/kern/kern_switch.c      (revision 243911)
+++ sys/kern/kern_switch.c      (working copy)
@@ -176,6 +176,11 @@ retry:
 /*
  * Kernel thread preemption implementation.  Critical sections mark
  * regions of code in which preemptions are not allowed.
+ * It would seem a good idea to inline such function but, in order
+ * to prevent instructions reordering by the compiler, a __compiler_membar()
+ * must be used here (look at sched_pin() case).  The performance penalty
+ * imposed by the membar could, then, produce slower code than
+ * the function call itself, for most cases.
  */
 void
 critical_enter(void)