Why kernel kills processes that run out of memory instead of just failing memory allocation system calls?

Fri May 22 20:07:11 UTC 2009

:On Thursday 21 May 2009 23:37:20 Nate Eldredge wrote:
:> Of course all these problems are solved, under any policy, by having more
:> memory or swap. =A0But overcommit allows you to do more with less.
:
:Or to put it another way, 90% of the problems that could be solved by havin=
:g=20
:more memory can also be solved by pretending you have more memory and hopin=
:g=20
:no-one calls your bluff.
:
:Jonathan

    It's a bit more complicated then that.  Most of the memory duplication
    (or lack of) which occurs after a fork() is deterministic.  It's not
    a matter of pretending, it's a matter of practical application.

    For example, when sendmail fork()'s a deterministic subset of the 
    duplicated writable memory will never be modified by the child.  Ever.
    This is what overcommit takes advantage of.  Nearly every program which
    fork()'s has a significant level of duplication of writable
    memory which deterministically reduces the set of pages which will
    ever need to be demand-copied.  The OS cannot predict which pages these
    will be, but the effect from a whole-systems point of view is well
    known and deterministic.

    Similarly the OS cannot really determine who is responsible for running
    the system out of memory.  Is it that big whopping program X or is it
    the 200 fork()'ed copies of server Y?  Only a human being can really
    make the determination.

    This is also why turning off overcommit can easily lead to the system
    failing even if it is nowhere near running out of actual memory.  In
    otherwords, the only real practical result of turning off overcommit is
    to make a system less stable and less able to deal with exceptional
    conditions.  Systems which cannot afford to run out of memory are built
    from the ground-up to not allocate an unbounded amount of memory in the
    first place.  There's no other way to do it.  The Mars Rover is a good
    example of that.  In such systems actually running out of memory is
    often considered to be a fatal fault.

						-Matt