Memory reserves or lack thereof
Alan Cox
alan.l.cox at gmail.com
Sun Nov 11 21:40:26 UTC 2012
On Sat, Nov 10, 2012 at 7:20 AM, Konstantin Belousov <kostikbel at gmail.com>wrote:
> On Fri, Nov 09, 2012 at 07:10:04PM +0000, Sears, Steven wrote:
> > I have a memory subsystem design question that I'm hoping someone can
> answer.
> >
> > I've been looking at a machine that is completely out of memory, as in
> >
> > v_free_count = 0,
> > v_cache_count = 0,
> >
> > I wondered how a machine could completely run out of memory like this,
> especially after finding a lack of interrupt storms or other pathologies
> that would tend to overcommit memory. So I started investigating.
> >
> > Most allocators come down to vm_page_alloc(), which has this guard:
> >
> > if ((curproc == pageproc) && (page_req != VM_ALLOC_INTERRUPT)) {
> > page_req = VM_ALLOC_SYSTEM;
> > };
> >
> > if (cnt.v_free_count + cnt.v_cache_count > cnt.v_free_reserved ||
> > (page_req == VM_ALLOC_SYSTEM &&
> > cnt.v_free_count + cnt.v_cache_count >
> cnt.v_interrupt_free_min) ||
> > (page_req == VM_ALLOC_INTERRUPT &&
> > cnt.v_free_count + cnt.v_cache_count > 0)) {
> >
> > The key observation is if VM_ALLOC_INTERRUPT is set, it will allocate
> every last page.
> >
> > >From the name one might expect VM_ALLOC_INTERRUPT to be somewhat rare,
> perhaps only used from interrupt threads. Not so, see kmem_malloc() or
> uma_small_alloc() which both contain this mapping:
> >
> > if ((flags & (M_NOWAIT|M_USE_RESERVE)) == M_NOWAIT)
> > pflags = VM_ALLOC_INTERRUPT | VM_ALLOC_WIRED;
> > else
> > pflags = VM_ALLOC_SYSTEM | VM_ALLOC_WIRED;
> >
> > Note that M_USE_RESERVE has been deprecated and is used in just a
> handful of places. Also note that lots of code paths come through these
> routines.
> >
> > What this means is essentially _any_ allocation using M_NOWAIT will
> bypass whatever reserves have been held back and will take every last page
> available.
> >
> > There is no documentation stating M_NOWAIT has this side effect of
> essentially being privileged, so any innocuous piece of code that can't
> block will use it. And of course M_NOWAIT is literally used all over.
> >
> > It looks to me like the design goal of the BSD allocators is on
> recovery; it will give all pages away knowing it can recover.
> >
> > Am I missing anything? I would have expected some small number of pages
> to be held in reserve just in case. And I didn't expect M_NOWAIT to be a
> sort of back door for grabbing memory.
> >
>
> Your analysis is right, there is nothing to add or correct.
> This is the reason to strongly prefer M_WAITOK.
>
Agreed. Once upon time, before SMPng, M_NOWAIT was rarely used. It was
well understand that it should only be used by interrupt handlers.
The trouble is that M_NOWAIT conflates two orthogonal things. The obvious
being that the allocation shouldn't sleep. The other being how far we're
willing to deplete the cache/free page queues.
When fine-grained locking got sprinkled throughout the kernel, we all to
often found ourselves wanting to do allocations without the possibility of
blocking. So, M_NOWAIT became commonplace, where it wasn't before.
This had the unintended consequence of introducing a lot of memory
allocations in the top-half of the kernel, i.e., non-interrupt handling
code, that were digging deep into the cache/free page queues.
Also, ironically, in today's kernel an "M_NOWAIT | M_USE_RESERVE"
allocation is less likely to succeed than an "M_NOWAIT" allocation.
However, prior to FreeBSD 7.x, M_NOWAIT couldn't allocate a cached page; it
could only allocate a free page. M_USE_RESERVE said that it ok to allocate
a cached page even though M_NOWAIT was specified. Consequently, the system
wouldn't dig as far into the free page queue if M_USE_RESERVE was
specified, because it was allowed to reclaim a cached page.
In conclusion, I think it's time that we change M_NOWAIT so that it doesn't
dig any deeper into the cache/free page queues than M_WAITOK does and
reintroduce a M_USE_RESERVE-like flag that says dig deep into the
cache/free page queues. The trouble is that we then need to identify all
of those places that are implicitly depending on the current behavior of
M_NOWAIT also digging deep into the cache/free page queues so that we can
add an explicit M_USE_RESERVE.
Alan
P.S. I suspect that we should also increase the size of the "page reserve"
that is kept for VM_ALLOC_INTERRUPT allocations in vm_page_alloc*(). How
many legitimate users of a new M_USE_RESERVE-like flag in today's kernel
could actually be satisfied by two pages?
More information about the freebsd-hackers
mailing list