Why kernel kills processes that run out of memory instead of just failing memory allocation system calls?

Thu May 21 07:15:24 UTC 2009

On Wed, 20 May 2009, Yuri wrote:

> Seems like failing system calls (mmap and sbrk) that allocate memory is more
> graceful and would allow the program to at least issue the reasonable error 
> message.
> And more intelligent programs would be able to reduce used memory instead of 
> just dying.

It's a feature, called "memory overcommit".  It has a variety of pros and 
cons, and is somewhat controversial.  One advantage is that programs often 
allocate memory (in various ways) that they will never use, which under a 
conservative policy would result in that memory being wasted, or programs 
failing unnecessarily.  With overcommit, you sometimes allocate more 
memory than you have, on the assumption that some of it will not actually 
be needed.

Although memory allocated by mmap and sbrk usually does get used in fairly 
short order, there are other ways of allocating memory that are easy to 
overlook, and which may "allocate" memory that you don't actually intend 
to use.  Probably the best example is fork().

For instance, consider the following program.

#define SIZE 1000000000 /* 1 GB */
int main(void) {
   char *buf = malloc(SIZE); /* 1 GB */
   memset(buf, 'x', SIZE); /* touch the buffer */
   pid_t pid = fork();
   if (pid == 0) {
     execlp("true", "true", (char *)NULL);
     perror("true");
     _exit(1);
   } else if (pid > 0) {
     for (;;); /* do work */
   } else {
     perror("fork");
     exit(1);
   }
   return 0;
}

Suppose we run this program on a machine with just over 1 GB of memory. 
The fork() should give the child a private "copy" of the 1 GB buffer, by 
setting it to copy-on-write.  In principle, after the fork(), the child 
might want to rewrite the buffer, which would require an additional 1GB to 
be available for the child's copy.  So under a conservative allocation 
policy, the kernel would have to reserve that extra 1 GB at the time of 
the fork(). Since it can't do that on our hypothetical 1+ GB machine, the 
fork() must fail, and the program won't work.

However, in fact that memory is not going to be used, because the child is 
going to exec() right away, which will free the child's "copy".  Indeed, 
this happens most of the time with fork() (but of course the kernel can't 
know when it will or won't.)  With overcommit, we pretend to give the 
child a writable private copy of the buffer, in hopes that it won't 
actually use more of it than we can fulfill with physical memory.  If it 
doesn't use it, all is well; if it does use it, then disaster occurs and 
we have to start killing things.

So the advantage is you can run programs like the one above on machines 
that technically don't have enough memory to do so.  The disadvantage, of 
course, is that if someone calls the bluff, then we kill random processes. 
However, this is not all that much worse than failing allocations: 
although programs can in theory handle failed allocations and respond 
accordingly, in practice they don't do so and just quit anyway.  So in 
real life, both cases result in disaster when memory "runs out"; with 
overcommit, the disaster is a little less predictable but happens much 
less often.

If you google for "memory overcommit" you will see lots of opinions and 
debate about this feature on various operating systems.

There may be a way to enable the conservative behavior; I know Linux has 
an option to do this, but am not sure about FreeBSD.  This might be useful 
if you are paranoid, or run programs that you know will gracefully handle 
running out of memory.  IMHO for general use it is better to have 
overcommit, but I know there are those who disagree.

-- 

Nate Eldredge
neldredge at math.ucsd.edu