mbuf changes

Julian Elischer julian at freebsd.org
Mon Sep 27 16:08:47 UTC 2010


  On 9/27/10 6:14 AM, Andre Oppermann wrote:
> On 27.09.2010 15:18, Luigi Rizzo wrote:
>> On Mon, Sep 27, 2010 at 02:55:45PM +0200, Andre Oppermann wrote:
>> ...
>>>> my idea was to have an extra field in the mbuf to tell how much room
>>>> should be reserved/used for metadata (such as mtags) after
>>>> the payload area so you don't need to change the allocator, and
>>>> possibly can even modify this on an existing mbuf.
>>>> Almost always mbufs have spare room (e.g. incoming pkts have all
>>>> data in the cluster and mostly empty mdata; outgoing, except
>>>> for rare cases, tend to be in a similar situation.
>>>> So this approach would allow to take an already allocated
>>>> mbuf and put the mtag in the spare area after the data.
>>>
>>> For incoming data this approach could work as usually 2K mbuf 
>>> clusters
>>> are used and they have trailing space available, or rather the normal
>>> mbuf referencing the cluster doesn't have its own data section 
>>> unused.
>>>
>>> When trailing space should be used the M_TAILINGSPACE() needs 
>>> modifications
>>> and a full tree audit is required to make sure that all mbuf 
>>> consumers are
>>> correctly using it and not some own version that directly assumes 
>>> certain
>>> mbuf sizes, etc.  A lot of work.
>>>
>>> For locally generated mbufs and socket buffers we try to use the 
>>> mbufs to
>>> their maximal extent.  When the socket buffer data is packetized 
>>> it normally
>>> is referenced then we get the normal mbuf with its data portion 
>>> unused.  So
>>> that could work.
>>>
>>> A complication is the m_tag_free() field and function which puts 
>>> the memory
>>> deallocation into the hands of the mtag user.  That means all mtag 
>>> consumers
>>> have to made aware of provided storage w/o having to return the 
>>> memory
>>> directly
>>> to the memory allocator (malloc/UMA).
>>>
>>> So the only way I realistically see is to make use of the mbuf's 
>>> unused
>>> data portion when it has external storage to it.  This should 
>>> probably
>>> cover about 98% of all cases.  The rest has to malloc() the mtag 
>>> storage
>>> as usual.
>>
>> so it wouldn't be bad -- i cannot judge the numbers, but definitely
>> it would work for all incoming traffic, plus all tcp data packets
>> (as the payload is in the cluster), plus all pure acks (which are 
>> small),
>> plus all UDP above some 200 bytes...
>
> Yes, about that.
>
>>> I could whip up a prototype for review in the next weeks.
>>
>> I seem to remember that jeffr had already something done in Perforce.
>
> That's a more general overhaul of the way mbuf's are structured and
> allocated with UMA.  I'm not sure it provides for the mtag issue.  Will
> check though.

I'd like to see if we can go over his stuff and any other suggested 
changes before 9.0
and see if we can agree on a change for 9.0

Jeff, we discussed this  a year ago.. do you still have your suggested 
changes?





More information about the freebsd-net mailing list