m_freem()

Tue Jan 1 15:58:14 PST 2008

Bruce Evans wrote:
> On Mon, 31 Dec 2007, Julian Elischer wrote:
> 
>> m_freem() would be a perfect candidate for an inline function.
>> (or even macro).
>> in the case where m is null, no function call would even be made...
>> The whole function is only 2 lines, and it gets called once for every 
>> packet :-)
> 
> On the contrary, m_freem() is a large function that is fairly unsuitable
> for inlining.  I just happened to count that it usually takes 180
> instructions in -current with no INVARIANTS etc. (down from 245
> instructions in ~5.2).  Further counting gave 112 and 132 instructions
> for it (180 was for ttcp udp input packets and 112 and 132 are for
> ping packets in 2 directions).
> 
> m_freem() is only one statement, but that statement consists mainly
> of a function call to a function that is inline (m_free()).  m_free()
> sometimes calls m_free_ext(), which is not inline, and usually calls
> uma_zfree(), which is inline, but which is just a wrapper for
> uma_zfree_arg(), which is not inline.  uma_zfree_arg() is very large
> and thus very unsuitable for inlining.  I didn't check for [nested]
> inlining of its internals at the source level.  At runtime it usually
> calls the non-inline-function m_dtor_mbuf() which calls the non-inline
> function m_tag_delete_chain(); then it calls critical_enter() and
> critical_exit().  critical_exit() is fairly large and sometimes calls
> thread_lock(), mi_switch() and thread_unlock(), but usually doesn't.
> So the non-inline part of the call chain is usually:
> 
>     m_freem()
>       uma_zfree_arg()     # the following is just 1 short path through this
>     m_dtor_mbuf()
>       m_tag_delete_chain()
>     critical_enter()
>     critical_exit()
> 
> [Pause to recover from a double fault panic in critical*().  critical*()
> or kdb is non-reeantrant somehwere, so tracing through critical_*() or
> one of its callers in order to count instructions tends to cause panics.]
> 
> All this is too large to inline.  Inlining only the top level of it would
> only make a tiny difference.  It might make a positive or negative
> difference, depending on whether the reduced instruction count has a larger
> effect than the increased cache pressure.  Generally I think it is bogus
> to inline at the top level.  Here inlining at the top level may win in 2
> ways:
> - by avoiding the function call to the next level (and thus all function
>   calls) in the usual case.  I think this doesn't happen here.  I think it
>   is the usual case for the m_free_ext() call in m_free(), so inlining
>   m_free() is a clear win.
> - by improving branch prediction.  With a branch in a non-inline function,
>   it may be mispredicted often because different classes of callers
>   make it go in different ways.  With branch the distributed in callers
>   by inlining, it can be predicted perfectly in individual callers
>   that don't change its direction often and/or change its direction
>   in predictable ways.  On Athlon CPUs, mispredicting a single branch
>   costs the same several function calls provided the implicit branches
>   for all the function calls are not mispredicted.  Too much inlining
>   is still bad.  Apart from busting icaches, it can bust branch
>   prediction caches -- with enough distribution of branches, all
>   branches will be mispredicted.
> 
> The m_freem() wrapper currently limits the icache bloat from the
> m_free() inline.  In RELENG_4, both m_free() and m_freem() are non-inline
> and non-macro.  That may be why networking in RELENG_4 is so much more
> efficient than in -current ;-).  (Actually it makes little difference.)

Interesting.. I hadn't realised that m_free() had become an inline.
It does make things more interesting.

> 
> Bruce
> _______________________________________________
> freebsd-net at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscribe at freebsd.org"