m_freem()
Julian Elischer
julian at elischer.org
Tue Jan 1 15:58:14 PST 2008
Bruce Evans wrote:
> On Mon, 31 Dec 2007, Julian Elischer wrote:
>
>> m_freem() would be a perfect candidate for an inline function.
>> (or even macro).
>> in the case where m is null, no function call would even be made...
>> The whole function is only 2 lines, and it gets called once for every
>> packet :-)
>
> On the contrary, m_freem() is a large function that is fairly unsuitable
> for inlining. I just happened to count that it usually takes 180
> instructions in -current with no INVARIANTS etc. (down from 245
> instructions in ~5.2). Further counting gave 112 and 132 instructions
> for it (180 was for ttcp udp input packets and 112 and 132 are for
> ping packets in 2 directions).
>
> m_freem() is only one statement, but that statement consists mainly
> of a function call to a function that is inline (m_free()). m_free()
> sometimes calls m_free_ext(), which is not inline, and usually calls
> uma_zfree(), which is inline, but which is just a wrapper for
> uma_zfree_arg(), which is not inline. uma_zfree_arg() is very large
> and thus very unsuitable for inlining. I didn't check for [nested]
> inlining of its internals at the source level. At runtime it usually
> calls the non-inline-function m_dtor_mbuf() which calls the non-inline
> function m_tag_delete_chain(); then it calls critical_enter() and
> critical_exit(). critical_exit() is fairly large and sometimes calls
> thread_lock(), mi_switch() and thread_unlock(), but usually doesn't.
> So the non-inline part of the call chain is usually:
>
> m_freem()
> uma_zfree_arg() # the following is just 1 short path through this
> m_dtor_mbuf()
> m_tag_delete_chain()
> critical_enter()
> critical_exit()
>
> [Pause to recover from a double fault panic in critical*(). critical*()
> or kdb is non-reeantrant somehwere, so tracing through critical_*() or
> one of its callers in order to count instructions tends to cause panics.]
>
> All this is too large to inline. Inlining only the top level of it would
> only make a tiny difference. It might make a positive or negative
> difference, depending on whether the reduced instruction count has a larger
> effect than the increased cache pressure. Generally I think it is bogus
> to inline at the top level. Here inlining at the top level may win in 2
> ways:
> - by avoiding the function call to the next level (and thus all function
> calls) in the usual case. I think this doesn't happen here. I think it
> is the usual case for the m_free_ext() call in m_free(), so inlining
> m_free() is a clear win.
> - by improving branch prediction. With a branch in a non-inline function,
> it may be mispredicted often because different classes of callers
> make it go in different ways. With branch the distributed in callers
> by inlining, it can be predicted perfectly in individual callers
> that don't change its direction often and/or change its direction
> in predictable ways. On Athlon CPUs, mispredicting a single branch
> costs the same several function calls provided the implicit branches
> for all the function calls are not mispredicted. Too much inlining
> is still bad. Apart from busting icaches, it can bust branch
> prediction caches -- with enough distribution of branches, all
> branches will be mispredicted.
>
> The m_freem() wrapper currently limits the icache bloat from the
> m_free() inline. In RELENG_4, both m_free() and m_freem() are non-inline
> and non-macro. That may be why networking in RELENG_4 is so much more
> efficient than in -current ;-). (Actually it makes little difference.)
Interesting.. I hadn't realised that m_free() had become an inline.
It does make things more interesting.
>
> Bruce
> _______________________________________________
> freebsd-net at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscribe at freebsd.org"
More information about the freebsd-net
mailing list