Memory allocation in kernel -- what to use in which situation? What is the best for page-sized allocations?

Sun Oct 2 13:19:58 UTC 2011

2011/10/2 Lev Serebryakov <lev at freebsd.org>:
> Hello, Freebsd-hackers.
>
>  Here are several memory-allocation mechanisms in the kernel. The two
> I'm aware of is MALLOC_DEFINE()/malloc()/free() and uma_* (zone(9)).
>
>  As far as I understand, malloc() is general-purpose, but it has
> fixed "transaction cost" (in term of memory consumption) for each
> block allocated, and is not very suitable for allocation of many small
> blocks, as lots of memory will be wasted for bookkeeping.
>
>  zone(9) allocator, on other hand, have very low cost of each
> allocated block, but could allocate only pre-configured fixed-size
> blocks, and ideal for allocation tons of small objects (and provide
> API for reusing them, too!).
>
>  Am I right?
>
>   But what if I need to allocate a lot (say, 16K-32K) of page-sized
> blocks? Not in one chunk, for sure, but in lifetime of my kernel
> module. Which allocator should I use? It seems, the best one will be
> very low-level only-page-sized allocator. Is here any in kernel?
>
> --
> // Black Lion AKA Lev Serebryakov <lev at FreeBSD.org>
>
> _______________________________________________
> freebsd-hackers at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe at freebsd.org"
>

My 2cents:
Everytime you request a certain amount of memory bigger than 4KB using
kernel malloc(), it results in a direct call to uma_large_malloc().
Right now, uma_large_malloc() calls kmem_malloc() (i.e. the memory is
requested to the VM directly).
This kind of approach has two main drawbacks:
1) it heavily fragments the kernel heap
2) when free() is called on these multipage chunks, it in turn calls
uma_large_free(), which immediately calls the VM system to unmap and
free the chunk of memory.  The unmapping requires a system-wide TLB
shootdown, i.e. a global action by every processor in the system.

I'm currently working supervised by alc@ to an intermediate layer that
sits between UMA and the VM, which goal is satisfyinh efficiently
requests > 4KB (so, the one you want considering you're asking for
16KB-32KB), but the work is in an early stage.

Best,

Davide