mbuf changes
Karim Fodil-Lemelin
kfl at xiplink.com
Tue Oct 5 16:21:34 UTC 2010
On 03/10/2010 9:13 AM, Luigi Rizzo wrote:
> On Sun, Oct 03, 2010 at 12:29:21AM +0100, Rui Paulo wrote:
>> On 2 Oct 2010, at 21:35, Juli Mallett wrote:
>>
>>> On Sat, Oct 2, 2010 at 12:07, Rui Paulo<rpaulo at freebsd.org> wrote:
>>>> On 2 Oct 2010, at 16:29, Robert Watson wrote:
>>>>> On Thu, 30 Sep 2010, Julian Elischer wrote:
>>>>>> On 9/30/10 10:49 AM, Ryan Stone wrote:
>>>>>>> It's not a big thing but it would be nice to replace the m_next and m_nextpkt fields with queue.h macros.
>>>>>> funny, I've never even thought of that..
>>>>> I have, and it's a massive change touching code all over the kernel in vast quantities. While in principle it's a good idea (consistently avoid hand-crafted linked lists), it's something I'd discourage on the basis that it probably won't significant reduce the kernel bug count, but will make it even harder for vendors with large local changes to the network stack to keep up.
>>>> I think it could also increase the kernel bug count. Unfortunately, we can't do this incrementally.
>>> Can't we? What about a union, so that we can gradually convert things
>>> but keep ABI and API compatibility? I mean, as long as we use the
>>> right queue.h type, anyway, it should be consistent? STAILQ,
>>> presumably.
>> Well, I don't have the layout of the mbuf struct offhand, but it's an idea worth investigating.
> what is the point of refactoring part of a struct that no new code is
> touching ?
>
> I'd like to keep this discussion on the original topics,
> i.e. performance-related issues (make room to embed mtags and other
> metadata such as FIB; have flexible per-socket initial padding so
> we don't always waste 100+ bytes just because ipv6+ipsec is compiled
> in; and so on).
> Please open another thread if you want to propose cosmetics or
> code refactoring or other unrelated changes
>
Hi,
I will share some of the experience I had doing embed mtags. Hopefully
its relevant :)
The idea of carrying a certain amount of mbuf tags within the mbuf
structure is somewhat similar but much cleaner, imo, then Linux's skbuff
char cb[40 - 48] (it was 40bytes in 2.4.x ...). Now this idea is not new
although as you know the devil is in the details...
What we did for BSD is create a container in the mbuf and extend the API
with functions we (pompously) called m_tag_fast_alloc() and
m_tag_fast_free(). This means the standard m_tag_alloc() is still
supported across the system and the old behavior is unchanged (list of
allocated struct attached to the packet header). Whats different is the
availability of a 'fast' call that directly uses the container within
the mbuf, effectively avoiding those malloc and cache misses. I'll
explain later how we effectively support calling m_tag_delete on a
'fast' tag.
The trick to save CPU cycles was also to quickly revert back to the
standard tag mechanism if some component in the system is manipulating
the tag list by deleting elements. Effectively, the m_tag_fast_free is a
NOP and fast tags are not deleted once allocated (unless m_free is
called on the mbuf of course). When m_tag_delete is called the container
simply becomes 'fast tag' invalid for further additions. This is not
flexible but has the merit of reducing the overall number of operations
given that almost no components are deleting tags without deleting the
mbuf (loopback does but its a special case).
One last thing we did is perform various operational tests to come up
with the most statistically optimized container size. Now this is much
easier to do on a proprietary system then for a general purpose OS but
its certainly possible.
Finally, we did see speed increase for our application and if someone is
interested I could provide a patch although I would have to rewrite it
without the proprietary bits in it.
Best regards,
Karim.
More information about the freebsd-net
mailing list