UMA & mbuf cache utilization.

Wed Dec 17 00:20:53 PST 2008

So far testing has shown in a pure transmit test, that this doesn't hurt
performance at all.

On Tue, Dec 9, 2008 at 6:22 PM, Jeff Roberson <jroberson at jroberson.net>wrote:

> Hello,
>
> Nokia has graciously allowed me to release a patch which I developed to
> improve general mbuf and cluster cache behavior.  This is based on others
> observations that due to simple alignment at 2k and 256k we achieve a poor
> cache distribution for the header area of packets and the most heavily used
> mbuf header fields.  In addition, modern machines stripe memory access
> across several memories and even memory controllers.  Accessing heavily
> aligned locations such as these can also create load imbalances among
> memories.
>
> To solve this problem I have added two new features to UMA.  The first is
> the zone flag UMA_ZONE_CACHESPREAD.  This flag modifies the meaning of the
> alignment field such that start addresses are staggered by at least align +
> 1 bytes.  In the case of clusters and mbufs this means adding
> uma_cache_align + 1 bytes to the amount of storage allocated.  This creates
> a certain constant amount of waste, 3% and 12% respectively.  It also means
> we must use contiguous physical and virtual memory consisting of several
> pages to efficiently use the memory and land on as many cache lines as
> possible.
>
> Because contiguous physical memory is not always available, the allocator
> had to have a fallback mechanism.  We don't simply want to have all mbuf
> allocations check two zones as once we deplete available contiguous memory
> the check on the first zone will always fail using the most expensive code
> path.
>
> To resolve this issue, I added the ability for secondary zones to stack on
> top of multiple primary zones.  Secondary zones are zones which get their
> storage from another zone but handle their own caching, ctors, dtors, etc.
> By adding this feature a secondary zone can be created that can allocate
> either from the contiguous memory pool or the non-contiguous single-page
> pool depending on availability.  It is also much faster to fail between them
> deep in the allocator because it is only required when we exhaust the
> already available mbuf memory.
>
> For mbufs and clusters there are now three zones each.  A contigmalloc
> backed zone, a single-page allocator zone, and a secondary zone with the
> original zome_mbuf or zone_clust name.  The packet zone also takes from both
> available mbuf zones.  The individual backend zones are not exposed outside
> of kern_mbuf.c.
>
> Currently, each backend zone can have its own limit.  The secondary zone
> only blocks when both are full.  Statistic wise the limit should be reported
> as the sum of the backend limits, however, that isn't presently done.  The
> secondary zone can not have its own limit independent of the backends at
> this time.  I'm not sure if that's valuable or not.
>
> I have test results from nokia which show a dramatic improvement in several
> workloads but which I am probably not at liberty to discuss.  I'm in the
> process of convincing Kip to help me get some benchmark data on our stack.
>
> Also as part of the patch I renamed a few functions since many were
> non-obvious and grew new keg abstractions to tidy things up a bit.  I
> suspect those of you with UMA experience (robert, bosko) will find the
> renaming a welcome improvement.
>
> The patch is available at:
> http://people.freebsd.org/~jeff/mbuf_contig.diff
>
> I would love to hear any feedback you may have.  I have been developing
> this and testing various version off and on for months, however, this is a
> fresh port to current and it is a little green so should be considered
> experimental.
>
> In particular, I'm most nervous about how the vm will respond to new
> pressure on contig physical pages.  I'm also interested in hearing from
> embedded/limited memory people about how we might want to limit or tune
> this.
>
> Thanks,
> Jeff
> _______________________________________________
> freebsd-arch at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-arch
> To unsubscribe, send any mail to "freebsd-arch-unsubscribe at freebsd.org"
>
>